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O (57) Abstract: A behavioral adaptation engine integrated with a voice application creation and deployment system has at least one 
\f} data input port for receiving XML-based client interaction data including audio files attached to the data; at least one data port for 
O sending data to and receiving data from external data systems and modules; a logic processing component including an XML reader, 
O voice player, and analyzer for processing received data; and a decision logic component for processing result data against one or 

more constraints. The engine intercepts client data including dialog from client interaction with a served voice application in real 
Q time and processes the received data for behavioral patterns and if attached, voice characteristics of the audio files whereupon the 

engine according to the results and one or more valid constraints identifies one or a set of possible enterprise responses for return to 

the client during ir 
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Behavioral Adaptation Engine for Discerning Behavioral Characteristics of Callers 
Interacting with An VXML-Compliant 
Voice Application 

5 by inventors 

Leo Chiu 

Field of the Invention 

1 0 The present invention is in the area of software application development and 

pertains particularly to methods and apparatus for discerning behavioral characteristics of 
users interacting with a voice application for distribution on a VXML application 
deployment architecture using a behavioral adaptation engine. 

15 Cross-Reference to Related Documents 

The present invention claims priority as a continuation in part of a U.S. patent 
application, serial number 10/190,080, entitled "Method and Apparatus for Improving 
Voice recognition performance in a voice application distribution system" filed on 
20 07/02/2002, which is a continuation in part of U.S. patent application serial number 
10/173,333, entitled "Method for Automated Harvesting of Data from A Web site 
using a Voice Portal System", filed on 06/14/2002, which claims priority to provisional 
application serial number 60/302,736. The instant application claims priority to the above 
mentioned applications in their entirety by reference. 

25 

Background of the Invention 



30 



A speech application is one of the most challenging applications to develop, 
deploy and maintain in a communications (typically telephony) environment. Expertise 
required for developing and deploying a viable application includes expertise in computer, 
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telephony integration (CTI) hardware and software, voice recognition software, text-to- 
speech software, and speech application logic. 

With the relatively recent advent of voice extensive markup language (VXML) 
the expertise require to develop a speech solution has been reduced somewhat. VXML is 
5 a language that enables a software developer to focus on the application logic of the voice 
application without being required to configuring underlying telephony components. 
Typically, the developed voice application is run on a VXML interpreter that resides on 
and executes on the associated telephony system to deliver the solution. 

As is shown in Fig. 1A (prior art) a typical architecture of a VXML-compliant 
1 0 telephony system comprises a voice application server (110) and a VXML-compliant 
telephony server (130). Typical steps for development and deployment of a VXML 
enabled IVR solutions are briefly described below using the elements of Fig. 1A. 

Firstly, a new application database (1 13) is created or an existing one is modified 
to support VXML. Application logic 1 12 is designed in terms of workflow and adapted 
15 to handle the routing operations of the IVR system. VXML pages, which are results of 

functioning application logic, are rendered by a VXML rendering engine (1 1 1) based on a 
specified generation sequence. 

Secondly, an object facade to server 130 is created comprising the corresponding 
VXML pages and is sent to server 130 over a network (120), which can be the Internet, an 
20 Intranet, or an Ethernet network. The VXML pages are integrated into rendering engine 
1 1 1 such that they can be displayed according to set workflow at server 110. 

Thirdly, the VXML-telephony server 130 is configured to enable proper retrieval 
of specific VXML pages from rendering engine 1 1 1 within server 1 10. A triggering 
mechanism is provided to server 1 10 so that when a triggering event occurs, an 
25 appropriate outbound call is placed from server 110. 

A VXML interpreter (131), a voice recognition text-to-speech engine (132), and 
the telephony hardware/software (133) are provided within server 130 and comprise 
server function. In prior art, the telephony hardware/software 130 along with the VXML 
interpreter 131 are packaged as an off-the-shelf IVR-enabling technology. Arguably the 
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most important feature, however, of the entire system is the application server 1 10. The 
application logic (1 12) is typically written in a programming language such as Java and 
packaged as an enterprise Java Bean archive. The presentation logic required is handled 
by rendering engine 1 1 1 and is written in JSP or PERL. 
5 An enhanced voice application system is known to the inventor and disclosed in 

the U.S. patent application entitled "Method and Apparatus for Development and 
Deployment of a Voice Software Application for Distribution to one or more 
Application Consumers" to which this application claims priority. That system uses a 
voice application server that is connected to a data network for storing and serving voice 
1 0 applications. The voice application server has a data connection to a network 

communications server connected to a communications network such as the well-known 
PSTN network. The communication server routes the created voice applications to their 
intended recipients. 

A computer station is provided as part of the system and is connected to the data 
15 network and has access to the voice application server. A client software application is 
hosted on the computer station for the purpose of enabling users to create applications 
and manage their states. In this system, the user operates the client software hosted on the 
computer station in order to create voice applications through object modeling and 
linking. The applications, once created, are then stored in the application server for 
20 deployment. The user can control and manage deployment and state of deployed 

applications including scheduled deployment and repeat deployments in terms of intended 
recipients. 

In one embodiment, the system is adapted for developing and deploying a voice 
application using Web-based data as source data over a communications network to one 
25 or more recipients. The enhanced system has a voice application server capable through 
software and network connection of accessing a network server and Web site hosted 
• therein and for pulling data from the site. The computer station running a voice 
application software has control access to at least the voice application server and is also 
capable of accessing the network server and Web site. An operator of the computer 
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station creates and provides templates for the voice application server to use in data-to- 
voice rendering. In this aspect, Web data can be harvested from a Web-based data source 
and converted to voice for delivery as dialogue in a voice application. 

In another embodiment, a method is available in the system described above for 
5 organizing, editing, and prioritizing the Web-based data before dialog creation is 

performed. The method includes harvesting the Web-based data source in the form of its 
original structure; generating an object tree representing the logical structure and content 
type of the harvested, Web-based data source; manipulating the object tree generated to a 
desired hierarchal structure and content; creating a voice application template in VXML 

10 and populating the template with the manipulated object tree; and creating a voice 

application capable of accessing the Web-based data source according to the constraints 
of the template. The method allows streamlining of voice application deployment and 
executed state and simplified development process of the voice application. 

A security regimen is provided for the above-described system. The protocol 

1 5 provides transaction security between a Web server and data and a voice portal system 

accessible through a telephony network on the user end and through an XML gateway on 
the data source end. The regimen includes one of a private connection, a virtual private 
network, or a secure socket layer, set-up between the Web server and the Voice Portal 
system through the XML gateway. Transactions carried on between the portal and the 

20 server or servers enjoy the same security that is available between secure nodes on the 
data network. In one embodiment, the regimen further includes a voice translation 
system distributed at the outlet of the portal and at the telephone of the end user wherein 
the voice dialog is translated to an obscure language not that of the users language and 
then retranslated to the users language at the telephone of the user. 

25 In such as system where templates are used to enable voice application dialog 

transactions, voice application rules and voice recognition data are consulted for the 
appropriate content interpretation and response protocol so that the synthesized voice 
presented as response dialog through the voice portal to the user is both appropriate in 
content and hopefully error free in expression. The database is therefore optimized with 
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vocabulary words that enable a very wide range of speech covering many different 
vocabulary words akin to many differing business scenarios. 

According to yet another aspect of the invention, vocabulary recognition is 
tailored for active voice applications according to client parameters. This is 
accomplished through a vocabulary management system adapted to constrain voice 
recognition processing associated with text-to-speech and speech-to-text rendering 
associated with use of an active voice application in progress between a user accessing a 
data source through a voice portal. The enhancement includes a vocabulary management 
server connected to a voice application server and to a telephony server, and an instance 
of vocabulary management software running on the management server for enabling 
vocabulary establishment and management for voice recognition software. In practice of 
the enhanced vocabulary management capability, an administrator accessing the 
vocabulary management server uses the vocabulary management software to create 
unique vocabulary sets or lists that are specific to selected portions of vocabulary 
associated with target data sources the vocabulary sets differing in content according to 
administrator direction. 

It will be appreciated by one with skill in the art of voice application deployment 
architecture that many users vying to connect and interact with a voice portal may in 
some cases create a bottleneck wherein data lines connecting voice application 
components to Web-sources and other data sources become taxed to their capacities. 
This problem may occur especially at peak use periods as is common for many normal 
telephony environments. It has occurred to the inventor that still more streamlining in 
terms of traffic optimization is required to alleviate potential line-use issues described 
above. 

A particular enhancement to the voice application distribution system known to 
the inventor addresses the traffic challenges described in the above paragraph. 
Application logics are provided for determining which portions (dialogs) of a voice 
application for deployment are cached at an application-receiving end system based on 
static and dynamic rules and in some cases (dynamic caching), statistical analysis results 
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are used in the determination. The application logic utilizes a processor for processing 
the voice application according to sequential dialog files and rules of the application. 
Logic components include a static content optimizer connected to the processor for 
identifying files containing static content; and a dynamic content optimizer connected to 
5 the processor for identifying files containing dynamic content. The optimizers determine 
which files should be cached at which end-system facilities, tag the files accordingly, and 
prepare those files for distribution to selected end-system cache facilities for local 
retrieval during consumer interaction with the deployed application. 

Being able to retrieve dialog portions of a voice application from a local cache 

1 0 facility increases response time at the voice portal by decreasing the load on the network 
connection to the voice application server. However, in addition to reduced traffic 
requirements, it is also important that text to speech recognition and speech to text 
renderings are clear and accurate. Accuracy of synthesized speech delivered to a caller is 
key to creating a successful voice application that can be interacted with in a dynamic 

1 5 fashion at both ends. 

As voice application distribution architectures expand to cross regional 
boundaries and even cultural boundaries the prospect of standardizing speech recognition 
rules dealing with terms and phrases that are commonly spoken becomes increasingly 
difficult. For example, pronunciations of certain terms in a same language will vary 

20 significantly according to region. Common labels such as the way major roads and 

highways are written and spoken can also vary significantly. There are many examples of 
phrase and term variations that need to be addressed if voice application interaction is 
practiced on larger architectures spanning large geographic regions. 

In yet another system enhancement known to the inventor, text-to speech 

25 preprocessing is used to render synthesized voice that is somewhat personalized to a 

caller according to pre-set constraints. The enhanced system is capable of preprocessing 
text strings for VXML view generation and subsequent voice rendering. The system has 
a text-to-speech preprocessing logic and a software table accessible to the preprocessing 
logic, the table adapted to serve text dialog options related to one or more text entities. 
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A rules base is provided and accessible to the preprocessing logic. The rules base 
is adapted to serve dialog selection constraints used to match specific dialog portions that 
are then used to annotate a text string. Dialog options and text entities are stored in an 
accessible data store. In a preferred embodiment the preprocessing logic accesses the 
5 software table during client interaction with a deployed voice application and selects a 

specific dialog option from more than one dialog option related to a single text entity, and 
inserts the selected option into the VXML page rendering process, the selection is made 
according to return of one or more of the served constraints. 

While the enhanced system provides personalization of voice dialog to specific 

1 0 groups of callers depending upon pre-set constraints, which may cover a wide variety of 
industry specific, social, geographic and cultural considerations, the system is still largely 
robotic and does not respond to individual attitudes and behaviors. It has occurred to the 
inventor that instant attitudes moods and behaviors of callers interacting with a voice 
application, if understood at the time of interaction, could be leveraged to increase 

15 customer satisfaction, enterprise sales figures, and efficiency of the interaction process in 
general. 

What is clearly needed is a method and apparatus for discerning, in real time, 
specific attitudes and behaviors of individual clients interacting with a voice application 
on a case-by-case basis, and for responding to such discernments during interaction by 
20 delivering selected dialog options, including provision of options not normally provided 
with the instant application and/or exclusion of certain options normally provided with 
the instant application. A system such as this could increase client satisfaction, enterprise 
success ratios, and efficiency of interaction, and could also improve conclusion ratios of 
interactions over time of deployment. 

25 

Summary of the Invention 

In a preferred embodiment of the present invention a voice application creation 
and deployment system is provided, comprising a voice application server for serving 
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voice applications to clients over a data network, at least one voice portal node having 
access to the data network, the portal node for facilitation of client interaction with the 
voice applications, and a behavioral adaptation engine executable from the application 
server. The system is characterized in that the behavioral adaptation engine intercepts 
client responses during voice interaction with a served application, analyzes them for one 
or a combination of behavior patterns and mood states according to pre-existing 
constraints and received client information, and determines which of a set of possible 
dialog responses including linked to or insert able options will be submitted for VXML 
page rendering to create a next enterprise response played to the client. 

In some preferred embodiments of the system the data network is the Internet 
network. In others the data network may be a combination of the Internet and telephony 
network. In some cases behavioral adaptation engine is part of the application logic of 
the voice application server. In other cases the at least one voice portal is an interactive 
voice response system combined with a telephony server. In some embodiments the at 
least one voice portal is a computerized node connected to a data network having access 
to the Internet. 

In some alternative embodiments the behavioral adaptation engine analyzes audio 
files recorded at the at least one voice portal and sent to the application server as digital 
audio files attached to client responses, while in others the behavioral adaptation engine 
executes upon receipt of a trigger event. In some other embodiments the constraints are 
related to one or a combination of menu navigation behavior or perceived mood state of 
the client. In yet other embodiments the dialog responses and linked options are stored in 
a data store and are accessible to the behavioral adaptation engine. In still other 
embodiments the received client information includes one or a combination of line 
identification, number identification, client history data, voice imprint results, and 
recorded voice samples. 

In still further embodiments voice sampling is used to discern mood. Also in 
some further embodiments received client information is used in conjunction with voice 
analysis to determine a response. In some cases the behavioral adaptation engine detects 
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voice inflection variances and volume characteristics of sampled audio to facilitate mood 
discernment of a client. And in yet other embodiments the variances and volume 
characteristics of an interaction are collected over multiple interactions with a same 
application to develop statistics used in gauging enterprise response probability values. 

In yet another aspect of the invention a behavioral adaptation engine integrated 
with a voice application creation and deployment system is provided comprising at least 
one data input port for receiving XML-based client interaction data including audio files 
attached to the data, at least one bi-directional data port for sending data to and receiving 
data from external data systems and modules, a logic processing component including an 
XML reader and voice player and analyzer for processing received data, and a decision 
logic component for processing result data against one or more constraints. The engine is 
characterized in that the behavioral adaptation engine intercepts client data including 
dialog from client interaction with a served voice application in real time and processes 
the received data for behavioral patterns and if attached, voice characteristics of the audio 
files whereupon the engine according to the results and one or more constraints identifies 
one or a set of possible enterprise responses for return to the client during interaction. 

In some embodiments the engine is hosted in a voice application server. In other 
embodiments the server is hosted on the Internet network. In some cases the voice 
application and deployment system includes at least one voice portal for facilitation of 
client access to voice applications. In still other cases the engine is executed to function 
upon receipt of a trigger event. 

In yet other cases the constraints are related to one or a combination of menu navigation 
behavior or perceived mood state of the client. 

In some embodiments data from external data resources is used as additional input 
data for decision processing. In others the received client data includes one or a 
combination of line identification, number identification, client history data, and voice 
imprint results. In still others voice sampling is used to discern mood state. In some 
cases the voice analyzer detects voice inflection variances and volume characteristics of 
sampled audio to facilitate mood discernment of a client, and in other cases the variances 
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and volume characteristics of an interaction are collected over multiple interactions with a 
same application to develop statistics used in gauging enterprise response probability 
values. 

In still another aspect of the invention a method for identifying an appropriate one 
5 or set of a plurality of voice application dialog responses to data input resulting from a 
client interaction with a voice application is provided, comprising (a) receiving the data 
input during run of the voice application; (b) interpreting the data input; (c) analyzing the 
input for validity of one or more constraints; (d) comparing the analyzed results with 
additional external data; (e) analyzing the comparison results for continued validity of the 

10 one or more constraints; and (f) identifying one or more available response options 
according to the valid constraints. 

In some embodiments of this method the voice application is VXML compliant. 
In other embodiments in step (a) the data input includes client identification data, client 
dialog data, and digital audio sampled from the dialog. In still other embodiments steps 

1 5 (d) and (e) are optional steps. In still other embodiments in step (c) the constraint validity 
lends to indication of mood state determination of the client, and in others in step (c) the 
constraint validity lends to indication of behavioral state determination of the client, and 
in some others in step (c) the constraint validity lends to an indication of both mood state 
and behavioral state of the client. 

20 In some cases in step (d) the external data includes statistical data resulting from 

of past interactions with the same dialog of the same application. In other cases in step 
(f) identification of one or more available response options includes submitting the one or 
more response options to an external module for further processing and narrower 
selection. In stilt other cases the external module is a text-to-speech pre-processor. 

25 

Brief Description of the Drawing Figures 

Fig. 1 A is a block diagram illustrating a basic architecture of a VXML-enabled 
IVR development and deployment environment according to prior-art. 
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Fig. IB is a block diagram illustrating the basic architecture of Fig. 1A enhanced 
to practice the present invention. 

Fig. 2 is a process flow diagram illustrating steps for creating a voice application 
shell or container for a VXML voice application according to an embodiment of the 
5 present invention. 

Fig. 3 is a block diagram illustrating a simple voice application container 
according to an embodiment of the present invention. 

Fig. 4 is a block diagram illustrating a dialog object model according to an 
embodiment of the present invention. 
10 Fig. 5 is a process flow diagram illustrating steps for voice dialog creation for a 

VXML-enabled voice application according to an embodiment of the present invention. 

Fig. 6 is a block diagram illustrating a dialog transition flow after initial 
connection with a consumer according to an embodiment of the present invention. 

Fig. 7 is a plan view of a developer's frame containing a developer's login screen 
1 5 of according to an embodiment of the present invention. 

Fig. 8 is a plan view of a developer's frame containing a screen shot of a home 
page of the developer's platform interface of Fig. 7. 

Fig. 9 is a plan view of a developer's frame containing a screen shot of an address 
book 911 accessible through interaction with the option Address in section 803 of the 
20 previous frame of Fig. 8. 

Fig. 10 is a plan view of a developer's frame displaying a screen 1001 for creating 
a new voice application. 

Fig. 1 1 is a plan view of a developer's frame illustrating screen of Fig. 10 showing 
further options as a result of scrolling down. 
25 Fig. 12 is a screen shot of a dialog configuration window illustrating a dialog 

configuration page according to an embodiment of the invention. 

Fig. 13 is a screen shot 1300 of dialog design panel of Fig. 12 illustrating 
progression of dialog state to a subsequent contact. 
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Fig. 14 is a screen shot of a thesaurus configuration window activated from the 
example of Fig. 13 according to a preferred embodiment. 

Fig. 15 is a plan view of a developer's frame illustrating a screen for managing 
created modules according to an embodiment of the present invention. 
5 Fig. 16 is a block diagram of the dialog transition flow of Fig. 6 enhanced for 

Web harvesting according to an embodiment of the present invention. 

Fig. 1 7 is a block diagram of the voice application distribution environment of 
Fig. IB illustrating added components for automated Web harvesting and data rendering 
according to an embodiment of the present invention. 
1 0 Fig. 1 8 is a block diagram illustrating a Web-site logical hierarchy harvested and 

created as an object model. 

Fig. 19 is a block diagram illustrating the model of Fig. 1 8 being manipulated to 
simplify the model for economic rendering. 

Fig. 20 is, a process flow diagram illustrating intermediary steps for reducing 
1 5 complexity of a Web-site logical tree. 

Fig. 21 is a block diagram illustrating a secure connectivity between a Voice 
Portal and a Web server according to an embodiment of the invention. 

Fig. 22 is a block diagram illustrating the architecture of Fig. IB enhanced with a 
vocabulary management server and software according to an embodiment of the present 
20 invention. 

Fig. 23 is a block diagram illustrating various functional components of a VXML 
application architecture including cache optimization components according to an 
embodiment of the present invention. 

Fig. 24 is a process flow diagram illustrating steps for practice of the present 
25 invention. 

Fig. 25 is a block diagram of the VXML architecture of Fig. 23 enhanced with a 
text-to-speech- preprocessor according to an embodiment of the present invention. 

Fig. 26 is a block diagram illustration possible variances of speech renderings of a 
text string. 
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Fig. 27 is a block diagram illustrating an organized mapping table according to an 
embodiment of the present invention. 

Fig. 28 is a block diagram of the VXML architecture of Fig. 23 enhanced with a 
behavioral adaptation engine according to an embodiment of the present invention. 

Fig. 29 is a process flow diagram illustrating user interaction with the system of 
Fig. 28 according to one embodiment of the invention. 

Fig. 30 is a process flow diagram illustrating user interaction with the system of 
Fig. 28 according to another embodiment of the invention. 

Fig. 31 is a process flow diagram illustrating user interaction with the system of 
Fig. 8 according to yet another embodiment. 

Description of the Preferred Embodiments 

According to preferred embodiments of the present invention, the inventor teaches 
herein, in an enabling fashion, a novel system for developing and deploying real-time 
dynamic or static voice applications in an object-oriented way that enables inbound or 
outbound delivery of IVR and other interactive voice solutions in supported 
communications environments. 

Fig. 1A is a block diagram illustrating a basic architecture of a VXML-enabled 
TVR development and deployment environment according to prior art. As described with 
reference to the background section, the prior-art architecture of this example is known to 
and available to the inventor. Developing and deploying voice applications for the 
illustrated environment, which in this case is a telephony environment, requires a very 
high level of skill in the art. Elements of this prior-art example that have already been 
introduced with respect to the background section of this specification shall not be re- 
introduced. 

In this simplified scenario, voice application server 110 utilizes database/resource 
adapter 1 13 for accessing a database or other resources for content. Application logic 1 12 
comprising VXML script, business rules, and underlying telephony logic must be 



WO 2005/006116 



PCT/US2004/015461 



carefully developed and tested before single applications can be rendered by rendering 
engine 111. Once voice applications are complete and servable from server 110, they can 
be deployed through data network 120 to telephony server 130 where interpreter 131 and 
text-to speech engine 132 are utilized to formulate and deliver the voice application in 
5 useable or playable format for telephony software and hardware 1 33. The applications 
are accessible to a receiving device, illustrated herein as device 135, a telephone, through 
the prevailing network 134, which is in this case a public-switched-telephone-network 
(PSTN) linking the telephony server to the consumer (device 135) generally through a 
telephony switch (not shown). 

1 0 Improvements to this prior-art example in embodiments of the present invention 

concern and are focused in the capabilities of application server 1 10 with respect to 
development and deployment issues and with respect to overall enhancement to response 
capabilities and options in interaction dialog that is bi-directional. Using the description 
of existing architecture deemed state-of-art architecture, the inventor herein describes 

1 5 additional components that are not shown in the prior-art example of Fig. 1 A, but are 1 
illustrated in a novel version of the example represented herein by Fig. IB. 

Fig. IB is a block diagram illustrating the basic architecture of Fig. 1A enhanced 
to illustrate an embodiment of the present invention. Elements of the prior-art example of 
Fig. 1A that are also illustrated in Fig. IB retain their original element numbers and are 

20 not re-introduced. For reference purposes an entity (a person) that develops a voice 
application shall be referred to hereinafter in this specification as either a producer or 
developer. 

A developer or producer of a voice application according to an embodiment of the 
present invention operates preferably from a remote computerized workstation illustrated 
25 herein as station 140. Station 140 is essentially a network-connected computer station. 
Station 140 may be housed within the physical domain also housing application server 
110. In another embodiment, station 140 and application server 1 10 may reside in the 
same machine. In yet another embodiment, a developer may operate station 140 from his 
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or her home office or from any network-accessible location including any wireless 
location. 

Station 140 is equipped with a client software tool (CL) 141, which is adapted to 
enable the developer to create and deploy voice applications across the prevailing system 
5 represented by servers 1 10, 130, and by receiving device 135. CL 141 is a Web interface 
application similar to or incorporated with a Web browser application in this example, 
however other network situations may apply instead. CL 141 contains the software tools 
required for the developer to enable enhancements according to embodiments of the 
invention. Station 140 is connected to a voice portal 143 that is maintained either on the 
10 data network (Internet, Ethernet, Intranet, etc.) and/or within telephony network 134. In 
this example portal 143 is illustrated logically in both networks. Voice portal 143 is 
adapted to enable a developer or a voice application consumer to call in and perform 
functional operations (such as access, monitor, modify) on selected voice applications. 

Within application server 1 10 there is an instance of voice application 
1 5 development server 142 adapted in conjunction with the existing components 1 1 1 -1 1 3 to 
provide dynamic voice application development and deployment according to 
embodiments of the invention. 

Portal 143 is accessible via network connection to station 140 and via a network 
bridge to a voice application consumer through telephony network 134. In one example, 
20 portal 143 is maintained as part of application server 110. Portal 143 is, in addition to an 
access point for consumers is chiefly adapted as a developer's interface server. Portal 
143 is enabled by a SW instance 144 adapted as a server instance to CL 141. In a 
telephony embodiment, portal 143 may be an interactive voice response (IVR) unit. 

In a preferred embodiment, the producer or developer of a voice application 
25 accesses application server 1 10 through portal 143 and data network 120 using remote 
station 140 as a "Web interface" and first creates a list of contacts. In an alternative 
embodiment, station 140 has direct access to application server 1 10 through a network 
interface. Contacts are analogous to consumers of created voice applications. CL 141 
displays, upon request and in order of need, all of the required interactive interfaces for 
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designing, modifying, instantiating, and executing completed voice applications to launch 
from application server 110 and to be delivered by server 130. 

The software of the present invention enables voice applications to be modeled as 
a set of dialog objects having business and telephony (or other communication 
5 delivery/access system) rules as parameters without requiring the developer to perform 
complicated coding operations. A dialog template is provided for modeling dialog 
states. The dialog template creates the actual speech dialog, specifies the voice 
application consumer (recipient) of the dialog, captures the response from the voice 
application consumer and performs any follow-up actions based upon system 

10 interpretation of the consumer response. A dialog is a reusable component and can be 
linked to a new dialog or to an existing (stored) dialog. A voice application is a set of 
dialogs inter-linked by a set of business rules defined by the voice application producer. 
Once the voice application is completed, it is deployed by server 1 10 and is eventually 
accessible to the authorized party (device 135) through telephony server 130. 

15 The voice applications are in a preferred embodiment in the form of VXML to run 

on VXML-compliant telephony server 130. This process is enabled through VXML 
rendering engine 111. Engine 1 1 1 interacts directly with server 130, locates the voice 
application at issue, retrieves its voice application logic, and dynamically creates the 
presentation in VXML and forwards it to server 130 for processing and delivery. Once 

20 interpreter 131 interprets the VXML presentation it is sent to or accessible to device 135 
in the form of an interactive dialog (in this case an TVR dialog). Any response from 
device 135 follows the same path back to application server 1 10 for interpretation by 
engine 111. Server 110 then retrieves the voice application profile from the database 
accessible through adapter 113 and determines the next business rule to execute locally. 

25 Based upon the determination a corresponding operation associated with the rule is taken. 
A next (if required) VXML presentation is then forwarded to rendering engine 111, which 
in turn dynamically generates the next VXML page for interpretation, processing and 
deployment at server 130. This two-way interaction between the VXML-compliant 
telephony server (130) and the voice application server (110) continues in the form of an 
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automated logical sequence of VXML dialogs until the voice application finally reaches 
its termination state. 

A voice application (set of one or more dialogs) can be delivered to the consumer 
(target audience) in outbound or inbound fashion. For an inbound voice application, a 
5 voice application consumer calls in to voice portal 143 to access the inbound voice 

application served from server 130. The voice portal can be mapped to a phone number 
directly or as an extension to a central phone number. In a preferred embodiment the 
voice portal also serves as a community forum where voice application producers can put 
their voice applications into groups for easy access and perform operational activities 
10 such as voice application linking, reporting, and text-to-speech recording and so on. 

For an outbound voice application there are two sub-types. These are on-demand 
outbound applications and scheduled outbound applications. For on-demand outbound 
applications server 1 1 0 generates an outbound call as soon as the voice application 
producer issues an outbound command associated with the application. The outbound 
1 5 call is made to the target audience and upon the receipt of the call the voice application is 
launched from server 130. For scheduled outbound applications, the schedule server (not 
shown within server 110) launches the voice application as soon as the producer-specified 
date and time has arrived. In a preferred embodiment both on-demand and scheduled 
outbound application deployment functions support unicast, multicast, and broadcast 
20 delivery schemes. 

As described above, a voice application created by application server 1 10 consists 
of one or more dialogs. The contents of each dialog can be static or dynamic. Static 
content is content sourcing from the voice application producer. The producer creates the 
contents when the voice application is created. Dynamic content sources from a third- 
25 party data source. 

In a preferred embodiment a developers tool contains an interactive dialog design 
panel (described in detail later) wherein a producer inputs a reference link in the form of 
extensible Markup Language (XML) to the dialog description or response field. When a 
dialog response is executed and interpreted by application server 1 10, the reference link 
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invokes a resource Application-Program-Interface (API) that is registered in resource 
adapter 113. The API goes out in real time and retrieves the requested data and integrates 
the returned data into the existing dialog. The resulting and subsequent VXML page 
being generated has the dynamic data embedded onto it. 

One object of the present invention is a highly dynamic, real time IVR system that 
tailors itself automatically to the application developer's specified data source 
requirement. Another object of the present invention is to enable rapid development and 
deployment of a voice application without requirement of any prior knowledge of VXML 
or any other programming technologies. A further object of the present invention is to 
reduce the typical voice application production cycle and drastically reduce the cost of 
production. 

Fig. 2 is a process flow diagram illustrating steps for creating a voice application 
shell or container for a VXML voice application according to an embodiment of the 
present invention. A developer utilizing a client application known as a thin client 
analogous to CL 141 on station 140 described with reference to Fig. lb, creates a voice 
application shell or voice application container. At step 201 the developer logs in to the 
system at a login page. At step 202 the developer creates a contact list of application 
consumers. Typically a greeting or welcome page would be displayed before step 202. 
An application consumer is an audience of one or more entities that would have access to 
and interact with a voice application. A contact list is first created so that all of the 
intended contacts are available during voice application creation if call routing logic is 
required later on. The contact list can either be entered individually in the event of more 
than one contact by the producer or may be imported as a set list from some 
organizer/planner software, such as Microsoft Outlook™ or perhaps a PDA™ organizer. 

In one embodiment of the present invention the contact list may reside on an 
external device accessed by a provided connector (not shown) that is configured properly 
and adapted for the purpose of accessing and retrieving the list. This approach may be 
used, for example, if a large, existing customer database is used. Rather than create a 
copy, the needed data is extracted from the original and provided to the application. 
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At step 203, a voice application header is populated. A voice application header 
is simply a title field for the application. The field contains a name for the application 
and a description of the application. At step 204, the developer assigns either and 
inbound or outbound state for the voice application. An outbound application is 
5 delivered through an outbound call while the consumer accesses an inbound voice 
application. 

In the case of the inbound application, in step 205 the system sets a default 
addressee for inbound communications. The developer selects a dialog from a configured 
list in step 206. It is assumed in this example that the dialogs have already been created. 

10 At step 207, the developer executes the dialog and it is deployed automatically. 

In the case of an outbound designation in step 204, the developer chooses a launch 
type in step 208. A launch type can be either an on-demand type or a scheduled type. If 
the choice made by the developer in step 208 is scheduled, then in step 209, the developer 
enters all of the appropriate time and date parameters for the launch including parameters 

1 5 for recurring launches of the same application. In the case of an on demand selection for 
application launch in step 208, then in step 210 the developer selects one or more contacts 
from the contact list established in step 202. It is noted herein that step 210 is also 
undertaken by the developer after step 209 in the case of a scheduled launch. At step 207, 
the dialog is created. In this step a list of probable dialog responses for a voice 

20 application wherein interaction is intended may also be created and stored for use. 

In general sequence, a developer creates a voice application and integrates the 
application with a backend data source or, optionally, any third party resources and 
deploys the voice application. The application consumer then consumes the voice 
application and optionally, the system analyzes any consumer feedback collected by the 

25 voice application for further interaction if appropriate. The steps of this example pertain 
to generating and launching a voice application 'from "building blocks" that are already in 
place. 

Fig. 3 is a block diagram illustrating a simple voice application container 300 
according to an embodiment of the present invention. Application container 300 is a 
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logical container or "voice application object" 300. Also termed a shell, container 300 is 
logically illustrated as a possible result of the process of Fig. 2 above. Container 300 
contains one or more dialog states illustrated herein as dialogs 301 a-n labeled in this 
example as dialogs 1-4. Dialogs 301a-n are objects and therefore container 300 is a 
5 logical grouping of the set of dialog objects 301a-n. 

The represented set of dialog objects 301a-n is interlinked by business rules 
labeled rules 1-4 in this example. Rules 1-4 are defined by the developer and are rule 
objects. It is noted herein that that there may be many more or fewer dialog objects 301a- 
n as well as interlinking business rule objects 1-4 comprising container object 300 

1 0 without departing from the spirit and scope of the present invention. The inventor 
illustrates 4 of each entity and deems the representation sufficient for the purpose of 
explaining the present invention. 

In addition to the represented objects, voice application shell 300 includes a 
plurality of settings options. In this example, basic settings options are tabled for 

15 reference and given the element number 305 a-c illustrating 3 listed settings options. 
Reading in the table from top to bottom, a first setting launch type (305a) defines an 
initial entry point for voice application 300 into the communications system. As 
described above with reference to Fig. 2 step 204, the choices for launch type 305a are 
inbound or outbound. In an alternative embodiment, a launch type may be defined by a 

20 third party and be defined in some other pattern than inbound or outbound. 

Outbound launch designation binds a voice application to one or more addressees 
(consumers). The addressee may be a single contact or a group of contacts represented by 
the contact list or distribution list also described with reference to Fig. 2 above (step 202). 
When the outbound voice application is launched in this case, it is delivered to the 

25 addressee designated on a voice application outbound contact field (not shown). All 
addressees designated receive a copy of the outbound voice application and have equal 
opportunity to interact (if allowed) with the voice application dialog and the 
corresponding backend data resources if they are used in the particular application. 
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In the case of an inbound voice application designation for launch type 305a^ the 
system instructs the application to assume a ready stand-by mode. The application is 
launched when the designated voice application consumer actively makes a request to 
access the voice application. A typical call center IVR system assumes this type of 
5 inbound application. 

Launch time setting (305b) is only enabled as an option if the voice application 
launch type setting 305a is set to outbound. The launch time setting is set to instruct a 
novel scheduling engine, which may be assumed to be part of the application server 
function described with reference to Fig. IB. The scheduling engine controls the 

10 parameter of when to deliver of when to deliver the voice application to the designated 
addressees. The time setting may reflect on-demand, scheduled launch, or any third- 
party-defined patterns. 

On-demand gives the developer full control over the launch time of the voice 
application. The on-demand feature also allows any third-party system to issue a trigger 

1 5 event to launch the voice application. It is noted herein that in the case of third-party 
control the voice application interaction may transcend more than one communications 
system and or network. 

Property setting 305c defines essentially how the voice application should behave 
in general. Possible state options for setting 305c are public, persistent, or sharable. A 

20 public state setting indicates that the voice application should be accessible to anyone 

within the voice portal domain so that all consumers with minimum privilege can access 
the application. A persistent state setting for property 305c ensures that only one copy of 
the voice application is ever active regardless of how many consumers are attempting to 
access the application. An example of such a scenario would be that of a task-allocation 

25 voice application. For example, in a task-allocation scenario there are only a number of 
time slots available for a user to access the application. If the task is a request from a 
pool of contacts such as perhaps customer-support technicians to lead a scheduled chat 
session, then whenever a time slot has been selected, the other technicians can only select 
the slots that are remaining. Therefore if there is only one copy of the voice application 
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circulating within the pool of technicians, the application captures the technician's 
response on a first-come first-serve basis. 

A sharable application state setting for property 305a enables the consumer to 
"see" the responses of other technicians in the dialog at issue, regardless of whether the 
5 voice application is persistent or not. Once the voice application shell is created, the 
producer can then create the first dialog of the voice application as described with 
reference to Fig. 2 step 207. It is reminded herein that shell 300 is modeled using a 
remote and preferably a desktop client that will be described in more detail later in this 
specification. 

1 0 Fig. 4 is a block diagram illustrating a dialog object model 400 according to an 

embodiment of the present invention. Dialog object model 400 is analogous to any of 
dialog objects 301a-n described with reference to Fig. 3 above. Object 400 models a 
dialog and all of its properties. A properties object illustrated within dialog object 400 
and labeled Object Properties (410) contains the dialog type and properties including 

1 5 behavior states and business rules that apply to the dialog. 

For example, every dialog has a route-to property illustrated in the example as 
Route To property (41 1). Property 41 1 maps to and identifies the source of the dialog. 
Similarly, every dialog has a route-from property illustrated herein as Route From 
property (412). Route from property 412 maps to and identifies the recipient contact of 

20 the dialog or the dialog consumer. 

Every dialog falls under a dialog type illustrated in this example by a property 
labeled Dialog Type and given the element number 413. Dialog type 413 may include but 
is not limited to the following types of dialogs: 

1. Radio Dialog : A radio dialog allows a voice application consumer to interactively 
25 select one of available options from an option list after hearing the dialog 

description. 

2. Bulletin Dialog : A bulletin dialog allows a voice application consumer to interact 
with a bulletin board-like forum where multiple consumers can share voice 
messages in an asynchronous manner. 
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3. Statement Dialog : A statement dialog plays out a statement to a voice application 
consumer without expecting any responses from the consumer. 

4. Open Entry Dialog : An open entry dialog allows a voice application consumer to 
record a message of a pre-defined length after hearing the dialog description. 

5. Third Party Dialog : A third party dialog is a modular container structure that 
allows the developer to create a custom-made dialog type with its own properties 
and behaviors. An example would be Nuance's SpeechObject™. 

Each dialog type has one or more associated business rules tagged to it enabling 
determination of a next step in response to a perceived state. A rule compares the 
application consumer response with an operand defined by the application developer 
using an operational code such as less than, greater than, equal to, or not equal to. In a 
preferred embodiment of the invention the parameters surrounding a rule are as follows: 

If user response is equal to the predefined value, then perform one of the 
following: 

A. Do nothing and terminate the dialog state. 

B. Do a live bridge transfer to the contact specified. Or, 

C. Send another dialog to another contact. 

In the case of an outbound voice application, there are likely to be exception- 
handling business rules associated with perceived states. In a preferred embodiment of the 
present invention, exception handling rules are encapsulated into three different events: 

1 . An application consumer designated to receive the voice application rejects a 
request for interacting with the voice application. 

2. An application consumer has a busy connection at the time of launch of the voice 
application, for example, a telephone busy signal. And, 

3. An application consumer's connection is answered by or is redirected to a non- 
human device, for example, a telephone answering machine. 

For each of the events above, any one of the three follow-up actions are p 
according to perceived state: 

1. Do nothing and terminate the dialog state. 

2. Redial the number. 
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3. Send another dialog to another contact. 

Fig. 5 is a process flow diagram illustrating steps for voice dialog creation for a 
VXML-enabled voice application according to an embodiment of the present invention. 
5 All dialogs can be reused for subsequent dialog routing. There is, as previously 

described, a set of business rules for every dialog and contact pair. A dialog be active and 
be able to transit from one dialog state to another only when it is rule enabled. 

At step 501 a developer populates a dialog description field with a dialog 
description. A dialog description may also contain reference to XML tags as will be 
1 0 described further below. At step 502, parameters of the dialog type are entered based on 
the assigned type of dialog. Examples of the available parameters were described with 
reference to Fig. 4 above. 

At step 503 the developer configures the applicable business rules for the dialog 
type covering, as well, follow up routines. In one embodiment rules configuration at step 
15 503 resolves to step 505 for determining follow-up routines based on the applied rules. 
For example, the developer may select at step 505, one of three types of transfers. For 
example, the developer may configure for a live transfer as illustrated by step 506; 
transfer to a next dialog for creation as illustrated by step 507; or the developer may 
configure for dialog completion as illustrated by step 508. 
20 If the developer does not branch : off into configuring sub-routines 506, 507, or 508 

from step 505, but rather continues from step 503 to step 504 wherein inbound or 
outbound designation for the dialog is system assigned, then the process must branch 
from step 504 to either step 508 or 509, depending on whether the dialog is inbound or 
outbound. If at step 504, the dialog is inbound, then at step 508 the dialog is completed. 
25 If the assignment at step 504 is outbound, then at step 509 to configure call exception 
business rules. 

At step 510, the developer configures at least one follow-up action for system 
handling of exceptions. If no follow-up actions are required to be specified at step 510, 
then the process resolves to step 508 for dialog completion. If an action or actions are 
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configured at step 510, then at step 511 the action or actions are executed such as a 
system re-dial, which the illustrated action for step 511. 

In a preferred embodiment, once the voice application has been created, it can be 
deployed and accessed through the telephone. The method of access, of course, depends 
on the assignment configured at step 504. For example, if the application is inbound, the 
application consumer accesses a voice portal to access the application. As described 
further above, a voice portal is a voice interface for accessing a selected number of 
functions of the voice application server described with reference to Fig. IB above. A 
voice portal may be a connection-oriented-switched-telephony (COST) enabled portal or 
a data-network-telephony (DNT) enabled portal. In the case of an outbound designation 
at step 504, the application consumer receives the voice application through an incoming 
call to the consumer originated from the voice application server. In a preferred 
embodiment, the outbound call can be either COST based or DNT based depending on 
the communications environment supported. 

Fig. 6 is a block diagram illustrating a dialog transition flow after initial 
connection with a consumer according to an embodiment of the present invention. Some 
of the elements illustrated in this example were previously introduced with respect to the 
example of Fig. IB above and therefore shall retain their original element numbers. In 
this example, an application consumer is logically illustrated as Application Consumer 
600 that is actively engaged in interaction with a dialog 601 hosted by telephony server 
130. Server 130 is, as previously described a VXML compliant telephony server as is so 
labeled. 

Application server 1 10 is also actively engaged in the interaction sequence and 
has the capability to provide dynamic content to consumer 600. As application consumer 
600 begins to interact with the voice application represented herein by dialog 600 within 
telephony server 130, voice application server 110 monitors the situation. In actual 
practice, each dialog processed and sent to server 130 for delivery to or access by 
consumer 600 is an atomic unit of the particular voice application being deployed and 
executed. Therefore dialog 601 may logically represent more than one single dialog. 
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In this example, assuming more than one dialog, dialog 601 is responsible during 
interaction for acquiring a response from consumer 600. Arrows labeled Send and 
Respond represent the described interaction. When consumer 600 responds to dialog 
content, the response is sent back along the same original path to VXML rendering 
5 engine 111, which interprets the response and forwards the interpreted version to a 

provided dialog controller 604. Controller 604 is part of application logic 1 12 in server 
1 10 described with reference to Fig. IB. Dialog controller 604 is a module that has the 
ability to perform table lookups, data retrieve and data write functions based on 
established rules and configured response parameters. 

1 0 When dialog controller 604 receives a dialog response, it stores the response 

corresponding to the dialog at issue (601) to a provided data source 602 for data mining 
operations and workflow monitoring. Controller 604 then issues a request to a provided 
rules engine 603 to look-up the business rule or rules that correspond to the stored 
response. Once the correct business rule has been located for the response, the. dialog 

1 5 controller starts interpretation. If the business rule accessed requires reference to a third- 
party data source (not shown), controller 604 makes the necessary data fetch from the 
source. Any data returned by controller 604 is integrated into the dialog context and 
passed onward VXML rendering engine 1 1 1 for dialog page generation of a next dialog 
601. The process repeats until dialog 601 is terminates. 

20 In one embodiment, the business rule accessed by controller 604 as a result of a 

received response from consumer 600 carries a dialog transition state other than back to 
the current application consumer. In this case controller 604 spawns an outbound call 
from application server 1 10 to deliver the next or "generated dialog" to the designated 
target application consumer. At the same time, the current consumer has his/her dialog 

25 state completed as described with reference to Fig. 5 step 508 according to predefined 
logic specified in the business rule. 

It will be apparent to one with skill in the art that a dialog can contain dynamic 
content by enabling controller 604 to have access to data source 602 according to rules 
served by rule engine 603. In most embodiments there are generally two types of 
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dynamic content. Both types are, in preferred embodiments, structured in the form of - 
XML and are embedded directly into the next generated dialog page. The first of the 2 
types of dynamic content is classified as non-recurring. Non-recurring content makes a 
relative reference to a non-recurring resource label in a resource adapter registry within a 
5 resource adapter analogous to adapter 1 13 of voice application server 1 1 0 described with 
reference to Fig. IB. 

In the above case, when dialog controller 604 interprets the dialog, it first scans 
for any resource label. If a match is found, it looks up the resource adapter registry and 
invokes the corresponding resource API to fetch the required data into the new dialog 

10 context. Once the raw data is returned from the third-party data source, it passes the raw 
data to a corresponding resource filter for further processing. When completed in terms 
of processing by the filter, the dialog resource label or tag is replaced with the filtered 
data and is integrated transparently into the new dialog. 

The second type of dynamic content is recurring. Recurring content usually 

1 5 returns more than one set of a name and value pair. An example would be a list of stocks 
in an application consumer's stock portfolio. For example, a dialog that enables 
consumer 600 to parrot a specific stock and have the subsequent quote returned through 
another dialog state is made to use recurring dynamic content to achieve the desired 
result. Recurring content makes a relative reference to a recurring resource label in the 

20 resource adapter registry of voice application server 1 10. When controller 604 interprets 
the dialog, it handles the resource in an identical manner to handling of non-recurring 
content. However, instead of simply returning the filtered data back to the dialog context, 
it loops through the data list and configures each listed item as a grammar-enabled 
keyword. In so doing, consumer 600 can parrot one of the items (separate stocks) in the 

25 list played in the first dialog and have the response captured and processed for return in 
the next dialog state. The stock-quote example presented below illustrates possible 
dialog/response interactions from the viewpoint of consumer 600. 

Voice Application: "Good morning Leo, what stock quote do you want?" 
Application Consumer: "Oracle" 
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Voice Application: "Oracle is at seventeen dollars. " 

Voice Application: "Good morning Leo, what stock quote do you want?" 

This particular example consists of two dialogs. 

The first dialog plays out the statement "Good morning Leo, what stock quote do 
you want?" The dialog is followed by a waiting state that listens for keywords such as 
Oracle, Sun, Microsoft, etc. The statement consists of two dynamic non-recurring 
resource labels. The first one is the time in day: Good morning, good afternoon, or good 
evening. The second dynamic content is the name of the application consumer. In this 
case, the name of the consumer is internal to the voice application server, thus the type of 
the resource label is SYSTEM. In the actual dialog description field, it may look 
something like this: 

<resource type=' ADAPTER' name='time greeting' /> <resource type=' SYSTEM' 
name='target_contact7>, what stock quote do you want? | 

Because the dialog is expecting the consumer to say a stock out of his/her existing 

portfolio, the dialog type is radio dialog, and the expected response property of the radio 

dialog is 

<resource type=' ADAPTER' name='stockJist'> 
<param> 

<resource type='SYSTEM' name='target_contact_id'/> 
</param> 
</resource> 

This XML resource label tells dialog controller 604 to look for a resource label 
named stockjist and to invoke the corresponding API with target_contact_id as the 
parameter. Upon completion of the data fetching, the list of stocks is integrated into the 
dialog as part of the grammars. And whatever the user responds to in terms of stock 
identification is matched against the grammars at issue (stocks in portfolio) and assigned 
the grammar return value to the dialog response, which can then forward it to the next 
dialog as resource of DIALOG type. 
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The producer can make reference to any dialog return values in any subsequent 
dialog by using <resource type='DIALOG' name='dialog_name'/>. This rule enables 
the producer to play out the options the application consumer selected previously in any 
follow-up dialogs. 

The second dialog illustrated above plays out the quote of the stock selected from 
the first dialog, then returns the flow back to the first dialog. Because no extra branching 
logic is involved in this dialog, the dialog type in this case is a statement dialog. The 
dialog's follow-up action is simply to forward the flow back to the first dialog. In such a 
case, the dialog statement is: <resource type='DIALOG' name='select stock dialog'/> 

<resource type=' ADAPTER' name='get_stock_quote'> 

<param> 

<resource type='DIALOG' name='select stock dialog'/> 
</param> 
</resource> 

Besides making reference to ADAPTER, DIALOG and SYSTEM type, the dialog 
can also take in other resource types such as SOUND and SCRIPT. SOUND can be used 
to impersonate the dialog description by inserting a sound clip into the dialog description. 
For example, to play a sound after the stock quote, the producer inserts <resource 
type=' SOUND' name='beep'/> right after the ADAPTER resource tag. 
The producer can add a custom-made VXML script into the dialog description by using 
<resource type='RESOURCE' name=' confirm V> so that in the preferred embodiment, 
any VXML can be integrated into the dialog context transparently with maximum 
flexibility and expandability. 

It will be apparent to one with skill in the art that while the example cited herein 
use VXML and XML as the mark-up languages and tags, it is noted herein that other 
suitable markup languages can be utilized in place of or integrated with the mentioned 
conventions without departing from the spirit and scope of the invention. It will also be 
apparent to the skilled artisan that while the initial description of the invention is made in 
terms of a voice application server having interface to a telephony server using generally 
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HTTP requests and responses, it should be noted that the present invention can be 
practiced in any system that is capable of handling well-defined requests and responses 
across any distributed network. 

Figs. 7-15 illustrate various displayed Browser frames of a developer platform 
5 interface analogous to CL 141 of station 140 of Fig. IB. Description of the following 
interface frames and frame contents assumes existence of a desktop computer host 
analogous to station 140 of Fig. IB wherein interaction is enabled in HTTP 
request/response format as would be the case of developing over the Internet network for 
example. However, the following description should not limit the method and apparatus 
10 of the invention in any way as differing protocols, networks, interface designs and scope 
of operation can vary. 

Fig. 7 is a plan view of a developer's frame containing a developer's login screen 
of 700 according to an embodiment of the present invention. Frame 700 is presented to a 
developer in the form of a Web browser container according to one embodiment of the 
1 5 invention. Commercial Web browsers are well known and any suitable Web browser will 
support the platform. Frame 700 has all of the traditional Web options associated with 
most Web browser frames including back, forward, Go, File, Edit, View, and so on. A 
navigation tool bar is visible in this example. Screen 710 is a login page. The developer 
may, in one embodiment, have a developer's account. In another case, more than one 
20 developer may share a single account. There are many possibilities. 

Screen 710 has a field for inserting a login ID and a field for inserting a login 
personal identification number (PIN). Once login parameters are entered the developer 
submits the data by clicking on a button labeled Login. Screen 710 may be adapted for 
display on a desktop computer or any one of a number of other network capable devices 
25 following specified formats for display used on those particular devices. 

Fig. 8 is a plan view of a developer's frame 800 containing a screen shot of a 
home page of the developer's platform interface of Fig. 7. Frame 800 contains a 
sectioned screen comprising a welcome section 801, a product identification section 802 
and a navigation section 803 combined to fill the total screen or display area. A 
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commercial name for a voice application developer's platform that is coined by the 
inventor is the name Fonelet. Navigation section 803 is provided to display on the "home 
page" and on subsequent frames of the software tool. 

Navigation section 803 contains, reading from top to bottom, a plurality of useful 
links. Starting with a link to home followed by a link to an address book. A link for 
creating a new Fonelet (voice application) is labeled Create New. A link to "My" 
Fonelets is provided as well as a link to "Options". A standard Help link is illustrated 
along with a link to Logout. An additional "Options Menu" is the last illustrated link in 
section 803. Section 803 may have additional links that are visible by scrolling down 
with the provided scroll bar traditional to the type of display of this example. 

Fig. 9 is a plan view of a developer's frame 900 containing a screen shot of an 
address book 91 1 accessible through interaction with the option Address in section 803 of 
the previous frame of Fig. 8. Screen 91 1 as an interactive option for listing individual 
contacts and for listing contact lists. A contact list is a list of voice application consumers 
and a single contact represents one consumer in this example. However, in other 
embodiments a single contact may mean more than one entity. Navigation screen 803 is 
displayed on the left of screen 91 1 . In this example, contacts are listed by First Name 
followed by Last Name, followed by a telephone number and an e-mail address. Other 
contact parameters may also be included or excluded without departing from the spirit 
and scope of the invention. For example the Web site of a contact may be listed and may 
also be the interface for receiving a voice application. To the left of the listed contacts 
are interactive selection boxes used for selection and configuration purposes. Interactive 
options are displayed in the form of Web buttons and adapted to enable a developer to 
add or delete contacts. 

Fig. 10 is a plan view of a developer's frame 1000 displaying a screen 1001 for 
creating a new voice application. Screen 1001 initiates creation of a new voice 
application termed a Fonelet by the inventor. A name field 1002 is provided in screen 
1001 for inputting a name for the application. A description field 1003 is provided for 
the purpose of entering the applications description. A property section 1004 is 
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illustrated and adapted to enable a developer to select from available options listed as 
Public, Persistent, and Shareable by clicking on the appropriate check boxes. 

A Dialog Flow Setup section is provided and contains a dialog type section field 
1 005 and a subsequent field for selecting a contact or contact group 1 006. After the 
5 required information is correctly populated into the appropriate fields, a developer may 
"create" the dialog by clicking on an interactive option 1 007 labeled Create. 

Fig. 1 1 is apian view of a developer's frame 1 100 illustrating screen 1001 of Fig. 
10 showing further options as a result of scrolling down. A calling schedule 
configuration section 1 101 is illustrated and provides the interactive options of On 

10 Demand or Scheduled. As was previously described, selecting On Demand enables 
application deployment at the will of the developer while selecting scheduled initiates 
configuration for a scheduled deployment according to time/date parameters. A grouping 
of entry fields 1 102 is provided for configuring Time Zone and Month of launch. A 
subsequent grouping of entry fields 1 103 is provided for configuring the Day of Week 

1 5 and the Day of Month for the scheduled launch. A subsequent grouping of entry fields 

1 104 is provided for configuring the hour and minute of the scheduled launch. It is noted 
herein that the options enable a repetitive launch of the same application. Once the 
developer finishes specifying the voice application shell, he or she can click a Create 
Dialog button labeled Create to spawn an overlying browser window for dialog creation. 

20 Fig. 12 is a screen shot of a dialog configuration window 1200 illustrating a dialog 

configuration page according to an embodiment of the invention. In this window a 
developer configures the first dialog that the voice application or Fonelet will link to. A 
dialog identification section 1201 is provided for the purpose of identifying and 
describing the dialog to be created. A text entry field for entering a dialog name and a 

25 text entry field for entering dialog description are provided. Within the dialog description 
field, an XML resource tag (not shown) is inserted which for example, may refer to a 
resource label machine code registered with a resource adapter within the application 
server analogous to adapter 1 13 and application server 110 described with reference to 
Fig. IB. 
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A section 1202 is provided within screen 1200 and adapted to enable a developer 
to configure for expected responses. In this case the type of dialog is a Radio Dialog. 
Section 1202 serves as the business rule logic control for multiple choice-like dialogs. 
Section 1202 contains a selection option for Response of Yes or No. It is noted herein 
5 that there may be more and different expected responses in addition to a simple yes or no 
response. 

An adjacent section is provided within section 1202 for configuring any Follow- 
Up Action to occur as the result of an actual response to the dialog. For example, an 
option of selecting No Action is provided for each expected response of Yes and No. In 
10 the case of a follow-up action, an option for Connect is provided for each expected 
response. Adjacent to each illustrated Connect option, a Select field is provided for 
selecting a follow-up action, which may include fetching data. 

A Send option is provided for enabling Send of the selected follow-up action 
including any embedded data, A follow-up action may be any type of configured 
15 response such as send a new radio dialog, send a machine repair request, and so on. A 
send to option and an associated select option is provided for identifying a recipient of a 
follow-up action and enabling automated send of the action to the recipient. For example, 
if a first dialog is a request for machine repair service sent to a plurality of internal repair 
technicians, then a follow-up might be to send the same dialog to the next available 
20 contact in the event the first contact refused to accept the job or was not available at the 
time of deployment. 

In the above case, the dialog may propagate from contact to contact down a list 
until one of the contacts is available and chooses to interact with the dialog by accepting 
the job. A follow-up in this case may be to send a new dialog to the accepting contact 
25 detailing the parameters of which machine to repair including the diagnostic data of the 
problem and when the repair should take place. In this example, an option for showing 
details is provide for developer review purposes. Also interactive options for creating 
new or additional responses and for deleting existing responses from the system are 
provided. It is noted herein that once a dialog and dialog responses are created then they 
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are reusable over the whole of the voice application and in any specified sequence in a 
voice application. 

A section 1203 is provided within screen 1201 and adapted for handling Route-To 
Connection Exceptions. This section enables a developer to configure what to do in case 
5 of possible connection states experience in application deployment. For example, for a 
Caller Reject, Line Busy, or connection to Voice Mail there are options for No Action 
and for Redial illustrated. It is noted herein that there may be more Exceptions as well as 
Follow-up action types than are illustrated in this example without departing from the 
spirit and scope of the present invention. 

1 0 A Send option is provided for each type of exception for re-sending the same or 

any other dialog that may be selected from an adjacent drop down menu. For example if 
the first dialog is a request for repair services and all of the initial contacts are busy for 
example, the dialog may be sent back around to all of the contacts until one becomes 
available by first moving to a next contact for send after each busy signal and then 

1 5 beginning at the top of the list again on re-dial. In this case John Doe represents a next 
l-ecipient after a previous contact rejects the dialog, is busy, or re-directs to voice mail 
because of unavailability. Section 1203 is only enabled when the voice application is set 
to outbound. Once the first dialog is created and enabled by the developer then a second 
dialog may be created if desired by clicking on one of the available buttons labeled detail. 

20 Also provided are interactive buttons for Save Dialog, Save and Close, and Undo 
Changes. 

Fig. 13 is a screen shot 1300 of dialog design panel 1200 of Fig. 12 illustrating 
progression of dialog state to a subsequent contact. The dialog state configured in the 
example of Fig. 12 is now transmitted from a contact listed in Route From to a contact 
25 listed in Route To in section 1301, which is analogous to section 1201 of Fig. 12. In this 
case, the contacts involved are John Doe and Jane Doe. In this case, the dialog name and 
description are the same because the dialog is being re-used. The developer does not 
have to re-enter any of the dialog context. However, because each dialog has a unique 
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relationship with a recipient the developer must configure the corresponding business 
rules. 

Sections 1302 and 1303 of this example are analogous to sections 1202 and 1203 
of the previous example of Fig. 12. In this case if John Doe says no to the request for 
5 machine repair then the system carries out a bridge transfer to Jane Doe. In the case of 
exceptions, shown in Route-To Connection Exceptions region 1303, all the events are 
directed to a redialing routine. In addition to inserting keywords such as "Yes" or "No" 
in the response field 1302, the developer can create a custom thesaurus by clicking on a 
provided thesaurus icon not shown in this example. AH the created vocabulary in a 
1 0 thesaurus can later be re-used throughput any voice applications the developer creates. 

Fig. 14 is a screen shot of a thesaurus configuration window 1400 activated from 
the example of Fig. 13 according to apreferred embodiment. Thesaurus window 1400 
has a section 1401 containing a field for labeling a vocabulary word and an associated 
field for listing synonyms for the labeled word. In this example, the word no is associated 
1 5 with probable responses no, nope, and the phrase "I can not make it". In this way voice 
recognition regimens can be trained in a personalized fashion to accommodate for 
varieties in a response that might carry a same meaning. 

A vocabulary section 1402 is provided and adapted to list all of the created 
vocabulary words for a voice application and a selection mechanism (a selection bar in 
20 this case) for selecting one of the listed words. An option for creating a new word and 
synonym pair is also provided within section 1402. A control panel section 1403 is 
provided within window 1400 and adapted with the controls Select From Thesaurus; 
Update Thesaurus; Delete From Thesaurus; and Exit Thesaurus. 

Fig. 15 is a plan view of a developer's frame 1500 illustrating a screen 1 502 for 
25 managing created modules according to an embodiment of the present invention. 

After closing all dialog windows frame 1500 displays screen or page 1502 for 
module management options. Menu section 803 is again visible. Screen 1502 displays as 
a result of clicking on the option "My" or My Fonelet in frame 803. Screen 1502 lists all 
voice applications that are already created and usable. In the list, each voice application 
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has a check box adjacent thereto, which can be selected to change state of the particular 
application. A column labeled Status is provided within screen 1502 and located adjacent 
to the application list applications already created. 

The Status column lists the changeable state of each voice application. Available 
5 status options include but are not limited to listed states of Inactive, Activated and 

Inbound. A column labeled Direct Access ID is provided adjacent to the Status column 
and is adapted to enable the developer to access a voice application directly through a 
voice interface in a PSTN network or in one embodiment from a DNT voice interface. In 
a PSTN embodiment, direct access ID capability serves as an extension of a central phone 
10 number. A next column labeled Action is provided adjacent to the direct access ID 
column and is adapted to enable a developer to select and apply a specific action 
regarding state of a voice application. 

For example, assume that a developer has just finished the voice application 
identified as Field Support Center (FSC) listed at the top of the application identification 
15 list. Currently, the listed state of FSC is Inactive. The developer now activates the 

associated Action drop down menu and selects Activate to launch the application FSC on 
demand. In the case of a scheduled launch, the voice application is activated 
automatically according to the settings defined in the voice application shell. 

As soon as the Activate command has been issued, the on-demand request is 
20 queued for dispatching through the system's outbound application server. For example, 
John Doe then receives a call originating from the voice application server (1 10) that asks 
if John wants to take the call. If John responds "Yes," the voice application is executed. 
The actual call flow follows: 

System: "Hello John, you received a fonelet from Jim Doe , would you like to 
25 take this call?" 

John: "Yes." 

System: "Machine number 008 is broken, are you available to fix it?" 
John: "No." 

System: "Thanks for using fonelet. Goodbye!" 
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System: Terminate the connection with John, record the call flow to the data source, 
and spawn a new call to Jane Doe. 

System: "Hello Jane, you received a fonelet from Jim Doe, would you like to take 
this call?" 
5 Jane: "Yes." 

System: "Machine number 008 is broken, are you available to fix it?" 
Jane: "I cannot make it." 

System: "Please wait while fonelet transfers you to Jeff Doe." 
System: Carry out the bridge transfer between Jane Doe and Jeff Doe. When the 
10 conversation is completed, terminate the connection with Jeff and record the call 

flow to the data source. 

The default textual content of the voice application is being generated by the text- 
to-speech engine hosted on the telephony or DNT server. However, the voice application 
producer can access the voice portal through the PSTN or DNT server and record his/her 
15 voice over any existing prompts in the voice application. 

It will be apparent to one with skill in the art the method and apparatus of the 
present invention may be practiced in conjunction with a CTI-enabled telephony 
environment wherein developer access to for application development is enabled through 
a client application running on a computerized station connected to a data network also 
20 having connectivity to the server spawning the application and telephony components. 
The method and apparatus of the invention may also be practiced in a system that is 
DNT-based wherein the telephony server and application server are both connected to a 
data network such as the well-known Internet network. There are applications for all 
mixes of communications environments including any suitable multi-tier system enabled 
25 for VXML and or other applicable mark-up languages that may serve similar purpose. 
It will also be apparent to one with skill in the art that modeling voice applications 
including individual dialogs and responses enables any developer to create a limitless 
variety of voice application quickly by reusing existing objects in modular fashion 
thereby enabling a wide range of useful applications from an existing store of objects. 
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Auto-Harvesting Web Data 

In one embodiment of the present invention one or more Websites can be 
automatically harvested for data to be rendered by a VXML engine for generating a voice 
5 response accessible by users operating through a PSTN-based portal. Such an 
enhancement is described immediately below. 

Fig. 16 is a block diagram illustrating the dialog transition flow of Fig. 6 enhanced 
for Web harvesting according to an embodiment of the present invention. Dialog 
controller 604 is enhanced in this embodiment to access and harvest data from an HTML, 

1 0 WML, or other data source such as would be the case of data hosted on a Website. An 
example scenario for this embodiment is that of a banking institution allowing all of its 
customers to access their Web site through a voice portal. 

A Website 1600 is illustrated in this embodiment and is accessible to dialog 
controller 604 via a network access line 1 60 1 illustrated herein as two directional lines of 

15 communication. The first line is labeled Store/Fetch/Input leading from controller 604 
into site 1600. The second (return) line is labeled Data Return/Source Field. The 
separately illustrated communication lines are intended to be analogous to a bi-directional 
Internet or other network access line. An internal data source (602) previously described 
with reference to Fig. 6 above is replaced in Fig. 16 by Website 1600 for explanatory 

20 purpose only. It should be noted that multiple data sources both internal to server 1 1 0 

and external from server 1 10 could be simultaneously accessible to dialog controller 604. 

Website 1600 provides at least one electronic information page (Web page) that is 
formatted according to the existing rules for the mark-up language that is used for its 
creation and maintenance. Site 1600 may be one site hosting many information pages, 

25 some of which are inter-related and accessible through subsequent navigation actions. 

Controller 604 in this embodiment is enhanced for Website navigation at the direction of 
a user's voice inputs enabled by rule accessible by accessing rule engine 603. A data 
template (not shown) is provided for use by dialog controller 604 to facilitate logical data 
population from site 1600. Dialog controller 604 analyzes both Website source codes and 
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data fields as return data and uses the information to generate a VXML page for rendering 
engine 111. 

It is noted herein that all of the security and access mechanisms used at the site for 
normal Internet access are inferred upon the customer so that the customer may be 
5 granted access by providing a voice rendering (response) containing the security access 
information. This enables the customer to keep the same security password and/or 
personal identification number (PIN) for voice transactions through a portal as well as for 
normal Web access to site 1600 from a network-connected computer. 

Fig. 1 7 is a block diagram of the voice application distribution environment of 
10 Fig. IB illustrating added components for automated Web harvesting and data rendering 
according to an embodiment of the present invention. In this example, workstation 140 
running client software 141 has direct access to a network server 1701 hosting the target 
Website 1600. Access is provided by way of an Internet access line 1704. 

It is noted herein that there may be many servers 1701 as well as many hosted 
1 5 Websites of one or more pages in this embodiment without departing from the spirit and 
scope of the present invention. A database store 1702 is provided in this example and 
illustrated as connected to server 1 701 for the purpose of storing data. Data store 1702 
may be an optical storage, magnetic storage, a hard disk, or other forms suitable for 
storing data accessible online. In one embodiment, data store 1702 is a relational 
20 database management system (RDBMS) wherein a single access may involve one or 
more connected sub servers also storing data for access. 

The configuration of client application 141, workstation 140, server 1 702, 
Website 1600, and database 1702 connected by network 1704 enables Websites 
analogous to site 1600 to be culled or harvested. Application 141 can read and retrieve 
25 all of the default responses that exist for each HTML script or scripts of another mark-up 
language. These default responses are embedded into application logic 1 12 and VXML 
rendering engine 111. Once the content of a Web page has been culled and used in client 
141 to create the rendering, then VXML engine 1 1 1 can access the Website successfully 
in combination with application logic 1 12 and database/resource adaptor 1 13 by way of a 
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separate access network 1703. For example, if a user (not shown) accesses Website 1600 
through voice portal 143 from receiving device 135 (telephone), then he or she would be 
voice prompted for a password to gain access to the site. Subsequently, a voice rendering 
of the data on the site accessed would be recited to him or her over telephone 135. 

Generally speaking, the development process for a voice portal would be the same 
as was described above with references to Figs. 9-15 above. Some additional scripting or 
input of dialog is performed using client application 141. Rather that requiring that the 
application developer populate all of the fields from scratch, or re-apply previously 
entered options, fields used by the business logic as discussed earlier in Figs. 9 through 
15 may be created from information harvested from site 1600 in this case. For that 
purpose, a software adapter (not shown) is added to client software 141 that allows it to 
communicate with Web site 1600 and harvest the information, both from the source code 
comprising fields and labels, etc. as well as from data parameters and data variables. 

It is noted herein that the process for data access, retrieval and voice rendering is 
essentially the same with respect to the processes of Figs. 2-5 above except that a Website 
connection would be established before any other options are selected. 

In one embodiment, provision of connection 1703 between server 110 and server 
1701 enables the security environment practiced between communicating machines such 
a secure socket layer (SSL), firewall, etc to be applied in the created voice solution for a 
customer. On the analog side, the security is no different than that of a call-in line 
allowing banking services in terms of wiretap possibilities etc. 

It will be apparent to one with skill in the art that the method and apparatus of the 
invention can be practiced in conjunction with the Internet, an Ethernet, or any other 
suitable networks. Markup languages supported include HTML, SHTML, WML, 
VHTML, XML, and so on. In one embodiment, the Websites accessed may be accessed 
automatically wherein the password information for a user is kept at the site itself. There 
are many possible scenarios. 
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Prioritizing Web Data for Voice Rendering 

According to one aspect of the present invention a method is provided for 
selecting and prioritizing which Web data offerings from a harvested Web site will be 
5 filled into a template for a voice application. 

Fig. 1 8 is a block diagram illustrating a simple hierarchical structure tree of a Web 
site 1801 and a harvested version of the site 1810. Screen 1801 illustrates a simple Web 
site structure tree as might be viewed from a user interface. Selectable icons representing 
data elements are represented herein as solid lines 1802a through 1802n suggesting that 

1 0 there may be any number of icons provided within any exemplary Web site. For the 
purpose of this specification, icons 1802a-1802n represent selectable icons, logos, 
hyperlinks and so on. Classifications of each object 1802a-l 802n are illustrated herein as 
text labels 1803a through 1803n. For example, a selectable icon 1802a is one for 
navigating to the "home page" of the site as revealed by adjacent classification 1803a. A 

1 5 subsequent icon (1 802b) is a login page of the site as revealed by the classification login. 
In some cases, icons and classifications or labels may be one in the same (visibly not 
different). 

In this example, the hierarchical structure presents a login block, which the user 
must successfully navigate before other options are presented. The presented options 

20 Accounts, Status, History, Look-up, Trade, and Quotes are arranged in a hierarchical 
structure. For example one must access Accounts first before options for Status 
(Accounts/Status) or History (Accounts/Status/History) are available to the user. This 
standard structure may be inconvenient and uneconomical for template filling for the 
purpose of creating a voice application template for dialog navigation. One reason is that 

25 the voice application will be created with an attempt to use all of the data of the Web site, 
which likely will include graphics, charts and the like that would not be understood by an 
accessing user if the description is simply translated and recited as a voice dialog over the 
telephone. Another reason is that the generic hierarchy of Web site structure 1801 may 
not be of a desired hierarchy for rendering as voice dialog in a request/response format. 
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Typically then, certain data will be valuable, certain data will not be valuable, and the 
order data is presented at the dialog level will be important to the user as well as to the 
administrator (service provider). 

Screen 1 8 1 0 represents the same structure of screen 1801 that has been completely 
harvested wherein all of the icons and elements identified in source code of the site have 
been obtained for possible template filling. It is noted that the template enables a voice 
application to operate in the goal of obtaining and rendering updated data according to the 
constraints established by an administrator. Web site 1 8 1 0 is pre-prepared for template 
filling. Icons are labeled 1812a through 1812n and classifications are labeled 1813a 
through 1813n. 

Object 1810 is generated to emulate the generic structure of the Web site 
including graphics, charts, dialog boxes, text links, data fields, and any other offered 
feature that is present and enabled in the HTML or other language of the site. Because of 
the mitigating factors involved with a potentially large number of users accessing a voice 
portal to receive dialog, much streamlining is desired for user convenience as well as 
network load stabilization. Therefore, an intermediate step for object modeling elements 
and reorganizing the tree hierarchy is needed so that a voice application template can be 
filled according to a desired selection and hierarchy thus facilitating a more economic, 
optimized construction and execution of a resulting voice application. 

The object modeling tools of the invention can be provided as part of client 
application 141 described with reference to Fig. IB above. Created objects organized by 
hierarchy and desired content can be stored in application server 1 10 described with 
reference to Fig. 6 above or in a local database accessible to voice application server 110. 

Fig. 19 is a block diagram illustrating the Web site structure 1801 of Fig. 18 and a 
Web site object created and edited for template creation. Screen 1 801 is analogous to 
screen 1801 of Fig. 18 both in element and description thereof; therefore none of the 
elements or description of the elements illustrated with respect to structure 1801 of Fig. 
1 8 shall be reintroduced. 
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Screen 1910 represents a harvested Web site that started out with structure 1801, 
but has since been reorganized with element prioritization for the purpose of populating a 
voice application template in an optimized fashion. It can be seen in this example, that 
significant editing has been performed to alter the original content and structure of the 
harvested Web site. Icons 1912a through 1912n illustrated the icons that have been 
retained after harvesting. 1913a through 1913n represent the classifications of those 
objects. Firstly, an optimization is noted with respect to icons labeled Home and Login in 
structure 1801. These items in harvested object 1910 have been optimized through 
combination into one specified object labeled login and given the element number 1913a. 
In this case Account Status and History is streamlined to Balance the most valuable piece 
and the most commonly requested information. Also in this case any charts, graphs or 
other visuals that may not be understood if rendered as a voice dialog are simply 
eliminated from the voice application template. The intermediate step for organization 
before template filling would be inserted in between steps of harvesting the Web site data 
and populating the voice application header. 

After successful login, wherein the user inputs a voice version of the PIN/User 
Name/Password combination and is granted access to the voice application from a voice 
portal, the next priority in this example is to enable the user to quickly determine his or 
her account balance or balances. Element numbers 1912b and 1912c represent 2 balances 
assuming 2 accounts. There may be more or fewer prioritized icons without departing 
from the scope of the invention. In this case, the first "voice option" provided through 
the optimization process is to have account balances recited by telephone to the 
participating user. The other present and offered options of Look-up, Trade, and Quote, 
illustrated herein by element numbers 1913c through f are moved into a higher but same 
level of architecture or structure meaning that they are afforded the same level of 
importance. All three of these options are related in that a user request or response 
containing stock symbol information can be used to initiate any of the actions. 

Fig. 20 is a process flow diagram illustrating added steps for practicing the 
invention. At step 2000, an administrator operating client application 141 described with 
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reference to Fig. 17 above harvests the Web-site for source data and data structure. At 
step 2001, the administrator creates an editable object representing the existing structure 
hierarchy of the target Web site. The object tree has the icons and associated properties 
and is executable when complete. In one embodiment, many of the standard icons and 
properties shared by many Web sites are provided for the administrator so that simple 
drag and drop operations can be used to create the tree. If a developer has to create a 
specific object from scratch, the source mark-up language can be used to construct the 
object from object building blocks representing object components. The new objects can 
then be saved to storage and re-used. 

In one embodiment, rendering the source description as instruction to a modeling 
engine automatically creates the object tree. In this case, the harvested object is presented 
to the administrator as harvested and "ready to edit" wherein steps 2000 and 2001 are 
largely if not completely transparent to the administrator. In another embodiment, the 
administrator simply drags and drops icons using a mouse provided with the workstation 
employed to do the modeling. 

At step 2002, the administrator may edit some objects to make them fit the 
constraints of VXML voice rendering more completely. . In the same step he or she may 
delete certain objects from the tree altogether. Still further in the same step the 
administrator may move and group objects according to priority of rendering. If a Web 
site contains a login requirement it will, of course, be the highest priority or the first 
executable dialog of the resulting voice application. Complicated logins may be 
simplified. Moreover one or more objects can be combined to be rendered in a same 
dialog. There are many possibilities. 

In still another embodiment, an object tree may be flattened to one level or an 
object tree may be expanded to contain more levels. The administrator may also insert 
content (rendered to dialog) that was not originally available from the Web site. The new 
content may be placed anywhere in the object free and will subsequently take its place of 
priority in the resulting dialogs of the voice application. Once the voice application is 
complete, the initiation and execution of the application lends to data access and retrieval 
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of any new data at the site. A standard navigation template is used to access the site and 
data is retrieved only according to class of data identified in the object tree. In this way 
unwanted data is not repeatedly accessed multiple times from a same Web site. 

In step 2003, the voice application template is populated as described above. At 
5 step 2004, the administrator can begin to parameterize the voice application execution 
including establishment of all of the CTI contact parameters. At step 2005, the 
administrator can create dialog. 

It will be apparent to one with skill in the art that pre-organizing Web harvested 
content for voice rendering is an extremely useful step for reducing complexity, reducing 
1 0 network and processor load, and for providing only pertinent and useful voice renderings 
to users accessing or contacted in the sense of outbound dialing from a connected voice 
portal system. 

Enhanced Security 

15 

Fig. 21 is a block diagram illustrating a secure connectivity between a Voice 
Portal and a Web server according to an embodiment of the invention. 

The connection scheme illustrated in this example connects a user (not shown) 
accessing a voice portal 2106 wherein portal 2106 has network access to Web-based data 
20 illustrated herein within Internet 2108, more particularly from a Web server 2109 
connected to a database 21 10. 

Voice portal 2106 comprises a voice application server (VAS) 2103 connected to 
an XML gateway 2104 by way of a data link 21 05. In this embodiment, data hosted by 
server 2109 is culled there from and delivered to XML gateway 2104 by way of line 
25 21 07. Application server 2103 then generates voice applications and distributes them to 
users having telephone connection to PSTN 2101. Telephony switches, service control 
points, routers and CTT-enabled equipment known to telephony networks may be 
assumed present within PSTN 2101. Similarly, routers servers and other nodes known in 
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the Internet may be assumed present in Internet 2108. The inventor deems the illustrated 
equipment sufficient for the purpose of explanation of the invention. 

Typically, a voice access to voice portal 2103 from anyone within PSTN 2101 
may be assumed to be unprotected whether it is an inbound or an outbound call. That is 
5 to say that anyone with a telephone line tapping capability can listen in on voice 

transactions conducted between users phones and the voice application server. Typically, 
prior art conventions with phone transactions such as IVR entry of social security and 
PIN identification is sufficient to access account information. However, anyone else with 
the same information can also access the user's automated account lines to find out 
10 balance information and so on. 

Server 2109 may be protected with Web certificate service wherein a user (on- 
line) accessing any data from server 2109 must send proof of acceptance and signature of 
the online authentication certificate. These regimens are provided as options in a user's 
Browser application. 

1 5 One way to extend security to the point of XML gateway 2 1 04 is through a 

completely private data network. A less expensive option is a VPN network as is 
illustrated in this example. Another way is through SSL measures such as HTTPS. Any 
of these methods may be used to extend the security regimens of server 2109 to Voice 
portal 2106. In this embodiment, gateway 2104 is adapted to operate according to the 

20 prevailing security measures. For example, if a user goes online to server 2109 changes 
his or her password information and signs a Web authentication certificate, the same 
change information would be recorded at the voice portal. 

The only security lapse then is between a user in the PSTN and portal 2 1 06. 
Information sent as voice to any user and response voice sent from any user can be 

25 obtained by tapping into line 21 02. One possible solution to protect privacy to some 

extent would be to use a voice translation mechanism at the voice portal and at the user 
telephone. In this way, the voice leaving the portal can be translated to an obscure 
language or even code. At the user end, the device (not shown) translates back to the 
prevailing language and plays on a delay over the telephone speaker system. 
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One with skill in the art will recognize that an additional advantage of using the 
existing security, VPN, SSL, etc. is that the security system has already been tested, and is 
being constantly improved. One with skill in the art will also recognize that many 
variations can be provided without departing from the spirit and scope of the invention. 
5 For example outsource WEB hosting may be used. Multi site WEB systems can be used 
for redundancy. Outsourced Voice services or multi service/location voice services may 
also apply. 

Vocabulary Management for Recognition Options 

10 

According to yet another aspect of the invention, the inventor provides a 
vocabulary management system and method that enhances optimization of voice 
recognition software. The method and apparatus is described in the enabling disclosure 
below. 

1 5 Fig. 22 is a block diagram illustrating the architecture of Fig. IB enhanced with a 

vocabulary management server 2200 and software 2201 according to an embodiment of 
the present invention. 

The system architecture of this embodiment is largely analogous to the 
architecture discussed with reference to Fig. IB above. Therefore, elements present in 

20 both examples Fig. IB and Fig. 22 shall not be reintroduced unless modified to practice 
the present invention. 

Vocabulary management server 2200 is adapted with an instance of vocabulary 
management software (VMS) 2201 for the purpose of tailoring voice recognition template 
options to just the required vocabulary to fully enable the instant voice application. 

25 Server 2200 may be presumed to have a data storage facility connected thereto or 

held internally therein adapted for the purpose of warehousing and organizing data. With 
regard to harvesting Web data and using the harvested Web data as source data for voice 
dialog as described further above with reference to the example of Fig. 17, the Web-based 
components are represented in this embodiment by Internet access lines, one connected 
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from workstation 140 giving it Web access and another connecting voice application 
server 110 giving it access through database/resource adapter 113. In this way, Web- 
access to any targeted Web-based data for auto harvesting, interpretation, and translation 
to voice dialog is assumed. 

Server 2200 can be accessed from workstation 140 running client application 141 
through voice application server 2202 or more particularly through database resource 
adapter 1 13 over a data link 2203. In this way, an administrator can set-up and 
manipulate vocabulary options attributed to specific on-line or off-line (internal) data 
sources. 

VMS software 2201 is adapted to enable separate and segregated sets of 
vocabulary specific to certain target data accessed and function allowed in conjunction 
with the target data. In one embodiment, additional subsets of vocabulary of a same 
target data source can be provided that are further tailored to specific clients who access 
the data through interaction from portal 143 over PSTN 134. Rule sets specific to the 
created vocabulary sets are created and tagged to the specific vocabulary sets and 
provided to application logic 112. 

VXML compliant telephony server 130 has a text-to-speech and a speech-to-text 
capable engine 2205 provided therein as an enhanced engine replacing engine 132 
described with reference to Fig. IB. In one embodiment the separate functions may be 
enabled by separate components. The inventor illustrates a single engine with dual 
capabilities for illustrative purpose only. Engine 2205 has access to vocabulary 
management server 2200 through a data link 2202. 

Server 2200 is accessible from application logic 112 of voice application server 
1 10 by way of a data link 2204 and from database resource adapter 1 13 by way of a data 
link 2203. In one embodiment, a single data link is sufficient to enable communication 
between the just-mentioned components in voice application server 100 and server 2200. 

In practice of the invention, assuming a Web-based data source is accessed, the 
voice recognition operates in a different way from previously described embodiments. 
For example, assume a client is accessing voice portal 143 in PSTN 134 from telephone 
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135 to interact with his or her personal investment Web page that contains option for 
account balance rendering and for stock trading. A specific vocabulary for the target Web 
site is available in server 2200 managed by VMS 220 1 . Perhaps a sub-set of the 
vocabulary particular to the client also exists and is organized under the parent 
vocabulary set. 

Telephony server 130 recognizes the accessing user and an existing voice 
application is triggered. Voice application server 2202 connects to the Web site on behalf 
of the user through database resource adapter 1 13 and the Internet access line. Following 
the constraints of the voice application template, the database resource adapter provides 
the user login and password information after the user communicates these in the first or 
opening dialog and then gets the account data and any other updated data that the user is 
entitled to. The first dialog response rendered to the user from the voice application may 
contain only the stock values pertinent to the user account and the existing monetary 
balances associated with the specific symbols. While there may be more information 
available to the user, some of the available information may not be pertinent to or useful 
to the user. Therefore, before each dialog rendering, VMS 2201 provides the appropriate 
vocabulary and rule set for the particular dialog function, in some cases particular as well 
to the accessing user. Therefore, voice recognition software is not required to search a 
large vocabulary to intemperate the rendered VXML page. In this case, the VXML page 
itself is limited by the vocabulary management function before it is delivered to telephony 
server 130. 

In another embodiment, intervention from VMS 220 1 may occur after the 
standard VXML page is rendered but before voice recognition begins in server 130. In 
this case, engine 2205 consults server 2200 to obtain the appropriate vocabulary 
constraints. In this example data not recognized from VXML is simply dumped. There 
are many differing points along the dialog process where VMS 2201 may be employed to 
streamline the voice recognition function. For example, in the first dialog response 
described further above, the user may be prompted to initiate any desired trading activity. 
If the user elects to do some trading then the speech to text portion of engine 2205 may 
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consult VMS 2201 for a limited trading vocabulary that is tailored to that client. Such a 
vocabulary may be expanded for a different client that is, for example, a VIP and has, 
perhaps more allowable options. Voice renderings from the client that do not match the 
provided vocabulary and/or do not conform to the rules are ignored. 
5 In addition to personalizing and streamlining vocabulary options for voice 

recognition, an administrator can use VMS to create new vocabulary and/or to create a 
plurality of synonyms that are recognized as a same vocabulary word. For example, an 
administrator may configure stock, share, and security as synonyms to describe paper. 
Sell, short, and dump may all be understood as synonyms for selling paper. There are 

10 many variant possibilities. In general, VMS 2201 can be applied in one communication 
direction (from service to user) as a management tool for limiting data on a VXML page 
for rendering, or for limiting voice recognition of the VXML page and dumping the 
unrecognized portion. VMS 2201 can be applied in dialog steps in the opposite direction 
(from user to service) to tailor voice recognition options allowed for a user or a user 

1 5 group according to service policy and constraint. 

In an embodiment where VMS 2201 works only with the VXML stream, it may 
be located within application server 1 1 0 or within telephony server 130. It is conceivable 
that different dialogs (both initial and response dialogs) of a same voice application for a 
• same client accessing a single data source can be constrained using different vocabulary 

20 sets using VMS 2201. Therefore the optimum level of management capability is at the 
level of action/response. By limiting the work of voice recognition processing at every 
available step during interaction, much processing power and bandwidth can be reserved 
for other uses. 

25 Local Cache Optimization (static, dynamic) 



In yet another aspect of the present invention a method and apparatus for reducing 
data traffic is provided that uses local cache optimization in a VXML distribution 
environment. 
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Fig. 23 is a block diagram illustrating various functional components of a VXML 
application architecture 2300 including cache optimization components according to an 
embodiment of the present invention. Fig. 23 is quite similar to Fig. 1, except that it is 
updated and shows additional detail. 

Architecture 2300 comprises basically a voice application server 2301, and a 
telephony server/voice portal 2302 as main components. Portal 2302 comprises a speech 
generator 2306 and a telephony hardware/software interface 2305. Portal 2302 is VXML 
compliant by way of inclusion of a VXML interpreter 2307 for interpreting VXML data 
sent thereto from application server 2301 . Voice portal 2302 is maintained as an access 
point within a telephony network such as the well-known PSTN network. However, 
portal 2302 may also be maintained on a wireless telephony network. 

A Web interface 2303 is illustrated in this example and serves as an access point 
from the well-known Internet or other applicable DPN. Voice portal 2302 may represent 
a CTI-enhanced IVR system, customer service point, or any other automated voice portal 
system. In the case of a Web-based portal, component 2303 may be a Web server, a 
computer connected to the Internet, or any other type of node that provides a user 
interface. 

Voice application server 2301 is similar in many respects to voice application 
2202 described with reference to Fig. 22. In this regard, voice application server has 
voice application development software (VADS) 2308 installed and executable thereon. 
VADS 2308 illustrated within the domain of voice application server 2301 has certain 
modules that shall herein be described using labels and shall not have element numbers 
assigned to them because of limited drawing space. Modules illustrated in VADS 2308 
include a contact manager (Contact Mgr.) instance adapted as a developers tool for 
managing the parameters of dialog recipients. A dialog controller (Dialog Ctrl.) is 
provided as a developer tool for creating and managing voice application dialogs and for 
initiating interface operations to rules sources and internal/external data sources. A 
Fonelet controller (Fonelet Ctrl.) is provided within VADS 2308 and adapted to control 
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the distribution of subsequent dialogs of a voice application. An XML generator (XML 
Gen.) is provided within VADS 2308 and adapted to generate XML for VXML pages. 

Voice application server 2301 has application logic 2309 provided therein and 
adapted to control various aspects of application delivery, creation, and management. 
Application logic 2309 includes a rule manager (Rule Mgr.) for providing the enterprise 
rules for application creation and deployment via the contact manager and dialog 
controller referenced above, and rules for ongoing user and system interactions with 
running applications. A dialog runtime processor (Dialog Run T. Prcsr.) is provided and 
adapted to control the way a completed dialog of a voice application is launched and 
formatted. A Fonelet runtime processor (Fonelet Runtime Prscsr.) is provided within 
application logic 2309 and controls various and sundry aspects of how voice applications 
(Fonelets) are executed and choreographed in real time. A dynamic grammar generator 
(Dynamic Grammar Gen.) is provided within application logic 2309 and is adapted to 
generate grammar keywords in association with non-recurring dialog content wherein the 
user, to retrieve instant results in a dynamic fashion, can speak the generated keywords. 

New components not before introduced within the application logic in server 2301 
are a static optimizer 23 12, and a dynamic optimizer 23 1 1 . The goal of the present 
invention is to optimize reduction of data traffic between portals 2302 and 2303 (if Web 
enabled) and voice application server 2301 . Accomplishing a reduction in data traffic 
between the voice application server and voice portals is especially important where the 
components are remote from one another and connected through relatively narrow data 
pipelines. Such pipelines can become bottled up with data at peak performance periods 
during operation causing a notable delay in response time at the voice portals. More 
detail about optimizers 23 12 and 23 1 1 and their relationship to the dialog runtime 
processor will be provided later in this specification. 

Server 2301 has a data/resource adapter block 2310 that contains all of the 
required modules for interfacing to external and to internal data sources. For example, an 
application manager (App. Mgr.) is provided within adapter 2310 and is adapted as a 
main interface module to user-end systems such as portals 2302 and 2303. The 
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application manager provides the appropriate data delivery of dialogs in order of 
occurrence, and in a preferred embodiment of the invention delivers static and dynamic 
dialog pieces (determined through optimization) for storage to one or more cache systems 
local to the user's end system. More about the role of the application manager will be 
5 provided further below. 

A report manager (Report Mgr.) is within adapter 2310 and is adapted to work 
with the application manager to provide reportable statistics regarding operation of voice 
application interactions. Report manager tracks a Fonelet (voice application) until it is 
completed or terminated. Background statistics can be used in the method of the present 

1 0 invention to help determine what dynamic (non recurring) dialog pieces of a voice 
application should be cached locally on the user-end. 

A third-party Web-service provider 2313 is illustrated in this example as external 
to server 2301 but linked thereto for communication. Third-party service 2313 represents 
any third-party service provider including software that can be used to tap into the voice 

1 5 application development and deployment services hosted within server 230 1 . Thin 
software clients licensed by users fall under third-party applications as do Web-based 
services accessible to users through traditional Web sites. To facilitate third-party 
connection capability, server 2301 has a Web resource connector (Web. Res. Conn.) that 
is adapted as a server interface to third-party functions. A Fonelet event queue (Fonelet 

20 Event Queue) is provided within adapter 23 1 0 and is adapted to queue incoming and 

outgoing Fonelet (voice application) events between the server and third-party-provided 
resources. A Fonelet XML interpreter (Fonelet XML Int.) is provided within adapter 
23 1 0 and adapted to interpret XML documents Incoming to or outgoing from the Fonelet 
event queue. 

25 A resource manager (Resource Mgr.) is provided within adapter 23 1 0 and is 

adapted to manage access to all accessible resources both external and internal. It is 
noted that internal resources may be maintained within the server itself, or within a 
domain of the server, the domain including other systems that may be considered within 
the domain such as internal data systems within a contact center hosting the voice 
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application server, for example. A database access manager (Database Access Mgr.) is 
provided within adapter 23 10 and is adapted to facilitate data retrieval from persistent 
data storage provided and associated with data stores located internally to the domain of 
server 2301. 

A VXML rendering engine 2314 is provided within application server 2301 and is 
adapted to render VXML pages in conjunction with the dialog controller in VADS 2308. 
Rendering engine 23 14 is analogous to engine 1 1 1 described with reference to Fig. 22 
and Fig. 6 above. 

Server blocks 2310, 2309, 2308, and engine 2314 communicate and cooperate 
with one another. Communication and cooperation capability is illustrated in this 
example by a logical sever bus structure 2315 connecting the blocks for communication. 
A similar logical bus structure 2316 is illustrated within portal 2302 and connects the 
internal components for communication. 

As previously described above, a voice application, once launched comprises a 
series of interactive dialog pieces that produce both static and dynamic results. For 
example, a company greeting that is played to every caller is considered a static greeting 
because there are no dynamic changes in the dialog from caller to caller. However, a 
dialog response to a user-request for a stock quote is considered dynamic because it can 
vary from caller to caller depending on the request. Similarly, data results pulled from a 
database or other external data source that are embedded into response dialogs cause the 
dialogs themselves to be considered dynamic because, although the basic template is 
static the embedded results can vary between callers. 

Static optimizer 2312 and dynamic optimizer 231 1 are provided to work in 
cooperation with the dialog runtime processor to identify pieces of dialog that should be 
distributed to end system cache storage facilities for local access during interaction with 
an associated voice application. Optimizers 23 12 and 23 1 1 are software modules that 
monitor and read dialog files during their initial execution or when the associated voice 
application is modified. Static optimizer 23 12 cooperates with the rule manager and tags, 
according to business rule, certain files that can be labeled static or recurring files that do 
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not change from caller to caller. Dynamic optimizer 23 1 1 cooperates with the rule 
manager and tags, according to business rule, certain files that are non-recurring from 
caller to caller, but are repeated often enough to wan-ant distributed caching to a cache 
local to an end system through which the associated voice application is accessed. 

In one embodiment, optimizers 2312 and 2311 are embedded modules running 
within the dialog runtime processor. In another embodiment, the optimizers are separate 
modules that are activated by the runtime processor when it processes dialogs of a 
particular voice application. 

When an administrator changes a voice application, or when a brand new voice 
application is created, then optimization processes of optimizers 23 1 1 and 23 12 are 
invoked to determine which data out of the application flow needs to be cached. Tagging 
can take the form of various file identification regimens known in the art. In a preferred 
embodiment, standard HTTP1.1 tagging is used. The optimizing components 2312 and 
23 1 1 can either add tags to untagged files, or, in some cases remove tags from already 
tagged files. This automated process allows an administrator to create dialogs without 
worrying about distribution issues that are associated with data traffic between servers. 

For static files, optimizer 2312 identifies which files to cache at an end system, 
tags them appropriately and prepares the tagged files for distribution to identified end- 
system cache. In the case of portal 2302 being the end system, the static files of a voice 
application would be stored locally in block 2305 in server cache. In one embodiment, 
the distributed static files are cached at a first deployment of a recently modified or brand 
new voice application. The first consumer to access the application will not experience 
any optimum performance due to the fact that the static files are cached during the first 
interaction. However, a subsequent consumer accessing the application from portal 2302, 
or a first caller that repeats the static portion of the application will experience a 
performance increase because the telephony server will access and serve the static portion 
of the application from local cache instead of retrieving the dialogs from application 
server 2301 every time they are requested. It is noted herein that caching static and 
dynamic content is temporary in a preferred embodiment. That is to say that when a 
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voice application is no longer used by the enterprise, or is replaced by a new application, 
the unnecessary files are deleted from the cache systems. 

Once static dialogs from voice applications are distributed to and cached within 
the telephony server portion of portal 2302, they can remain in cache for subsequent 
retrieval during subsequent interaction with associated voice applications. However, if a 
voice application is subsequently modified by an administrator and different dialogs are 
now identified as static cacheable dialogs, then those dialogs already cached will be 
replaced with the newer updated static dialogs. Any common form of identification and 
revision strategy can be used to synchronize the appropriate static files. Some dialogs 
may simply be dropped from an application being modified while other static dialogs may 
be newly added. In these instances of subsequent application modification concerning the 
presence of new, deleted or modified files that are deemed static, the synchronization of 
these files with those already stored can take place before an application is scheduled to 
be deployed to the end system, or during runtime of the application. 

In a preferred embodiment of the invention caching of dynamic files is performed 
in the voice Web controller module within telephony software/hardware block 2305 of 
portal 2302. Dynamic files are different than static files as dynamic files do not have to 
be retrieved during every execution and interaction with a voice application. Therefore, 
dynamic retrieval occurs only after user interaction with a voice application has begun. 
Statistical analysis can be used at voice application server 2301 to determine over several 
voice application deployments, which files make sense to continue to distribute to end- 
system cache facilities and, in some cases which files already cached for dynamic 
optimization should be deleted and subsequently removed from end-system local access. 

Fig. 24 is a process flow diagram illustrating steps for practice of the present 
invention. At step 2400a, a static greeting message is played such as "thank you for 
calling XYZ corporation". Once a voice application containing this dialog has been 
accessed from an end system, the particular dialog is stored locally if it is identified as a 
static dialog. Each time a subsequent access is made to the same voice application, 
greeting 2400a is pulled from local cache in step 2401 when ordered. 
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At step 2400n a last static message is played, which in this embodiment represents 
a menu message. It will be appreciated that there may be multiple static dialogs in a 
voice application as indicated in this example by the element assignment of 2400a-n in 
this example. Each time any static message 2400a-n is required in the voice application 
5 execution, it is pulled from local cache in step 2401 . The message played at step 2400n is 
a precursor to interaction such as " We have changed our menu. Please listen carefully. 
Your phone call may be recorded for training purposes." 

Because messages 2400a-n are played at the beginning part of, for example, an 
IVR interaction regardless of who the caller is, they can be statically cached within the 

10 telephony server representing the accessed end system or application consumer. As 
previously described above, HTTP 1.1 standard tags may be used to indicate which 
material to cache. The local server keeps the static files in store and uses them according 
to the appropriate application flow whenever a call comes in to the number or extension 
of that particular voice application. In some cases voice applications will be numerous at 

15 a single contact number with extensions separating them for access by callers. 

Without local caching of the static content, then the telephony server would 
typically make a request to the Web controller, which would then send a request to the 
runtime processor and fetch the message from the dialog runtime processor. The sound 
file would be sent from the processor back over the same network connection to the 

20 telephony server for instant play. It will be appreciated that local caching of dialog 

portions of a dynamic interactive voice application save significant bandwidth between 
the portal and the application server. Examples of other types of static dialogs that may 
be cached locally to an end-system include hours of operation, location or driving 
instructions, billing address, and so on which, in essence, never change dynamically. 

25 At step 2402, a user interacts with the voice application by initiating a selection 

resulting from the menu option dialog of step 2400n. At step 2403a a dynamic menu 
option or result is played. The option or result is retrieved as a result of the user-initiated 
selection or interaction to a previous static dialog. Therefore the next dialog the user 
hears is considered non-recurring or dynamic. This means that the result or menu option 
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can vary in content from call to call, the variance ordered by the first user interaction with 
the voice application. 

The rules that will govern whether or not to distribute a dialog to the local cache 
of an end-system through which a particular voice application is accessed can vary 
5 according to content, number of possible options or results, and in some cases statistical 
probability. For example, if a voice application is created for a banking institution 
wherein a dynamic menu has options for being transferred to a loan officer, a standard 
teller, or an automated account attendant, and statistically, 90% of all callers choose the 
transfer to the automated attendant, then the subsequent beginning dialog of the voice 

10 application associated with automated banking can be cached locally. In this case, the 

first 2 options request a live connection thereby terminating the voice application. The 3 rd 
option links to another dialog of the same application or to another application entirely. It 
will follow then that the next dialog may be static because it merely asks the caller to 
enter identification criteria. It is the same dialog for all callers who select "automated 

15 attendant". 

It is noted that criteria for dynamic optimization may vary widely. For example, 
personal information results embedded into a standard dialog template must be retrieved 
from the data sources of the institution and cannot be locally cached. However, the 
standard menu soliciting the interaction resulting in data fetch of personal information 

20 can be cached locally. 

Dialogs that are assigned to dynamic caching are retrieved from a Web controller 
in step 2403 each time they are selected. Moreover, step 2402 may occur repeatedly 
between dynamically cached dialogs. At step 2403n, a last dynamic menu option is 
played in a voice application sequence. It may be that statistically only a few users 

25 navigate to the end of the voice application or last menu. Therefore it may not be 

considered for local caching. However, many standard dynamic options and results can 
be dynamically cached in the event that probability is high that a large number of callers 
are going to request the option or result. 
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Results that typically are not fluid such as, perhaps the desired model and make of 
a product are dynamic results because there are other results available for return through 
interaction with the interactive menu. The most popular results can be dynamically 
cached as dialogs that can be retrieved locally even though every caller will not interact 
5 with the same result. Optimizers share database accessibility with all of the other 

modules described with respect to the application server of Fig. 23. Therefore, results 
that are commonly requested, although not completely static can be embedded into the 
dialog template and saved locally as a voice application dialog linked through to a certain 
selection made as a response to a previous dialog of the same application. 

1 0 In some cases of dynamic caching, the standard dialog is there without the 

embedded results, which are dynamic. In this case, a client application can be provided 
that retrieves the requested data using the voice application server as a proxy and embeds 
the data into the template locally to the user wherein after the user has accessed the data 
and moved on in the application, the embedded data is then deleted from the template 

1 5 until the next invocation. There are many possibilities. 

It will be apparent to one with skill in the art that the method and apparatus of the 
invention can be applied to access of both internal data sources as well as external data 
sources wherein some of the external data sources are network-based data sources 
analogous to Web-hosted data and data available over other types of digital data 

20 networks. 

Text to Speech Preprocessing 

In one aspect of the invention, a text-to-speech preprocessor is provided as an 
25 enhancement to the voice application system of the invention. The method and apparatus 
. of the invention is described in detail below. 

Fig. 25 is a block diagram of the VXML architecture of Fig. 23 enhanced with a 
text-to-speech- preprocessor 2501 according to an embodiment of the present invention. 
A VXML architecture 2500 is illustrated in this example and is analogous to VXML 
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architecture 2300 described with reference to Fig. 23 above. VXML architecture 2500 is 
enhanced in this example with a capability of rendering specialized voice pronunciations 
of phrases and terms according to rules based on socioeconomic demographics, industry 
specific terms, and regional demographics. 

All of the components illustrated in Fig. 23 above are also illustrated in this 
example. Therefore, formerly introduced components that are not modified as a result of 
the present invention shall retain the same element numbers. Voice application server 
2301 has software functional blocks 2314, 2308, 2309, and 2310 providing server 
functionality as was described with reference to Fig. 23. 

As previously described, VXML pages are generated and incorporated into a 
voice application that is dynamic in the sense that individual VXML pages may contain 
dynamic content and are generated on the fly during caller interaction. Voice application 
development software enables an administrator from a remote station to create voice 
applications using templates and schedule them for deployment. In some cases, 
consumers in a pull fashion access the applications. In other instances, the voice 
applications are deployed as outbound applications that are pushed. Distribution of voice 
applications created in server 2301 may include unicast, multicast and broadcast methods. 

Voice application dialogs are in a preferred embodiment transmitted to portals 
such as telephony/voice portal 2302 and or Web portal 2303 in the form of VXML pages. 
In server 2302, VXML interpreter 2307 renders VXML to synthesized voice, which is 
then spoken to a caller, in this case through telephony hardware block 2305. Similarly, 
responses from the caller are captured and rendered as XML for interpretation at the 
application server, which according to the interpreted response content, generates a new 
VXML page sent as a next dialog for the caller. Any data fetches performed result in the 
fetched data being included into the next VXML rendering or dialog. 

A text-to-speech (TTS) preprocessor 2501 is provided within block 2309 of 
application server 2301. TTS 2501 is adapted to preprocess text streams of dialogs with 
special instruction sets dealing with which of optional text renderings will be selected for 
inclusion into a VXML page or dialog. TTS preprocessor 2501 is connected to the 
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runtime dialog processor as shown by a directional arrow. Before the dialog runtime 
processor processes a text dialog for normal VXML rendering according to enterprise 
rules, TTS preprocessor 2501 annotates the text dialog according to information known 
about the voice application consumer, which typically is a caller interacting with the 
5 voice application. Known information can be information that is pre-known about a 
caller including location, region, preferences, education level, and so on. Known 
information can also be information that is gleaned from the caller at the time of 
interaction through caller line identity (CLED) and other connection identification 
mechanisms and through direct interaction with the caller by analyzing caller responses 

10 during interaction. 

TTS preprocessor 2501 has access to a dynamic hit list embodied as a table (not 
shown) that has options of different text renderings, each rendering is associated with 
and, in fact may be created according to one or more conditions that can be associated 
with the caller, the caller's region, demographic information and/or type of transaction or 

15 scenario. TTS preprocessor 2501 matches information gleaned from and/or pre-known 
about the caller to one or more of the rules or rule sets and then annotates the XML 
response stream accordingly. Annotation in this embodiment means selecting a specific 
text portion of a response from a variety of text options presented in the table. The 
subsequent VXML page rendered instructs speech generation at the callers end according 

20 to the annotated XML instructions. Therefore, the actual synthesized speech that the 
caller hears is dynamic in that it can vary between callers using the same voice 
application. 

TTS preprocessor 2501 has access to the Rule Mgr, which serves the "hit list" 
associated with a specific transaction occurring as a voice interaction between the caller 
25 and the enterprise. For example, if a caller is located in a specific county in Florida and 
want directions to a specific State highway wherein the highway name is essentially used 
in more than one location covered by the service, then TTS preprocessor 2501 would 
annotate a text response for VXML rendering that would take into account the caller's 
specific location. The rule then, would constrain the response to the specific highway 
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name used locally from the viewpoint of the caller. Perhaps in northern counties of 
Florida the highway name is "State Route 25", whereas in southern counties of Florida 
route 25 is more predominantly known as "Bean Memorial Freeway". Assuming the 
location of the caller to be in a southern county, the generated response interpreted at 
5 VXML interpreter 2307 would contain instructions for vocalizing " Bean Memorial 
Freeway" instead of "State Route 25". 

A wide variety of text variances related to industry specific terms, proper names 
of locations, names of roadways, and so on can be collected by a service-hosting 
enterprise and aggregated into application-dependant response options that are tabled as 

10 described above and then selected dynamically according to match of information-to-rule 
set for each session of voice interaction with a caller. A same voice application can 
therefore deliver dynamic responses tailored to a specific caller using the application. 

Fig. 26 is a block diagram illustration possible variances of speech renderings of a 
text string. In this example, variant possibilities of text to speech renderings are 

1 5 illustrated for a standard freeway entity "HWY 101" illustrated as a standard text block 
2600. A text-to-speech option 2600a instructs a speech generator to vocalize the 
rendering phonically as it is read, " Highway one hundred and one". A text-to-speech 
option 2600b instructs the speech generator to enunciate each character in quotation 
marks, "H" "W" "Y" "1" "0" "1", which phonetically may sound like "aich doubleu why 

20 one zero one". 

A text-to-speech rendering 2600c instructs a speech generator to enunciate "H" 
"W" "Y" as described above for rendering 2600b, but with the variant enunciation of 
"one hundred and one" instead of "one zero one". A text-to-speech rendering 2600d 
instruct a speech generator to enunciate "Highway" as does rendering 2600a, but with the 

25 variance "1 oh 1", which may phonetically sound like "one oh one". The variances 

illustrated herein reflect just one example of how a standard entity "HWY 101" may be 
textually varied to produce different voice dialogs that sound different from one another 
to a caller. The selection of which rendering to apply will depend on information about 
the caller that is pre-known or, in some instances, gleaned from the caller in real time. 
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In one embodiment of the present invention, a first voice initiation as a response 
to a dialog option may be analyzed for enunciation or "drawl" tendencies. For example, 
if a caller as a thick accent that is categorical in nature, the dialog response to the caller 
may be synthesized as to mimic the caller's accent or dialect. The benefit of mimicking a 
5 dialect during interaction is to make a caller feel more at ease with using the system. For 
example, in some areas of New York City, locals use certain slang terminology for 
regionally known landmarks. If the slang term is pre-known by the enterprise, then it can 
be used in a dialog response to a caller exhibiting the slang terminology. Furthermore, 
certain industry specific terms may have different meanings for different industries. If the 
1 0 caller is identified as an industry specific caller in a service that offers dialog related to 
more than one industry, then the correct term can be dynamically applied in a response to 
the caller. 

Fig. 27 is a block diagram illustrating an organized mapping table 2700 according 
to an embodiment of the present invention. Table 2700 represents a software table 

15 provided within the TTS processor of Fig. 23 or within an external data store that is 

accessible to the processor. Table 2700 is a hit list containing text to speech renderings 
and links to generated speech files associated with them. 

Table 2700 has a column 2701 that contains a selection of text-to-speech entities 
TS-1 through TS-5 . Text entities TS-1 through TS-5 are structurally analogous to entity 

20 2600 described with reference to Fig. 26 above. That is to say that TS entities correspond 
to basic identifiable terms including industry terms, place names, highways, state roads, 
landmarks, and so on. Table 2702 has a column listing spoken expressions or text-to- 
speech' renderings 1-6. Spoken expressions 1-6 are pre-prepared text renderings that 
correspond to the items (TS-1 through 5) contained in column 2701. HTTP 1.1 or other 

25 type of reference links (arrows), the plurality of which is represented herein by element 
number 2703 link each item in column 2701 to at least one item in column 2702. 

In this example, only one of the TS entities in table 2700 is linked to more than 
one variant text-to-speech rendering. TS-5, which may be "HWY 101", for example, is 
linked to spoken expression 5 and to spoken expression 6. Spoken expression 5 instructs 
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the proper enunciation of "H WY 101" used in the area of Los Angeles, California 
whereas spoken expression 6 instructs the proper enunciation of "HWY 101" as used in 
the San Francisco Bay Area. Of course it is assumed that the entity HWY 101 is spoken 
differently depending on region. Expressions 5 and 6 are analogous to text renderings 
5 2600a-d of Fig. 26. Therefore, the expressions 5 and 6 are XML text renderings that are 
selectable options based upon the rule of geographic origin of the application consumer or 
caller. 

It is presumed in this example that actual voice synthesis (generating an audible 
voice dialog response) is performed at the caller's end (voice portal) based on the 

10 instructions provided by a VXML page containing one of the expressions 5 or 6. In one 
embodiment, the voice files are pre-prepared by an administrator and distributed to end 
systems as part of the application deployment scheme. In this case, a pre-recorded voice 
file is selected based on interpretation of a received expression, in this case 5 or 6. 

Table 2700 may be a dynamic table in the sense that it may be constructed as a 

15 generic template and, depending upon the voice application being run, accessed and 
populated with the appropriate entities and text expression options used by the 
application at the time of application deployment. When a user interacts with the voice 
application, then TTS preprocessor 2501 accesses the populated table and determines 
which TTS expressions to select based on information either pre-known about or 

20 provided by the instant caller interacting with the application. 

In one embodiment of the present invention, if interaction with one voice 
application triggers deployment of another voice application having unrelated content, 
then information about the caller, in some cases gleaned from interaction with the first 
application is automatically passed to the domain of the second application for use in 

25 generation of a second table related to the new content options. It is noted herein that 

content specific tables associated with a single voice application can be generated on the 
fly from a master table of data stored in a system database accessible to processing 
components. 
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The ability to personalize automated voice responses sent from an enterprise to 
callers using voice applications developed by the enterprise provides an interaction 
experience for the caller that is enhanced from traditional monotone and user-same 
computerized responses. Such enhancement provides not only useful and pragmatic 
5 "translations" that are more understandable to the caller, but also entertainment value 
prompting more frequent use of such voice application distribution systems. 

Behavior-State Adaptation 

10 Fig. 28 is a block diagram of the VXML architecture of Fig. 25 enhanced with a 

behavioral adaptation engine according to an embodiment of the present invention. A 
VXML application deployment architecture 2800 is illustrated in this example and is 
analogous to VXML architecture 2500 described with reference to the description of Fig. 
25 above. Previously described components retain their original element numbers 

15 introduced in description of Figs. 23 and 25 above. Architecture 2800 comprises 
application server 2301, telephony server/voice portal 2302, and Web portal 2303. 

Voice application server 2301 is enhanced with a behavioral adaptation engine 
2801. Behavioral adaptation engine 2801 is adapted to discern user behavioral states and 
in some cases, emotional states during interaction with a voice application dialog. Engine 

20 2801 is part of application logic block 2309, which includes the previously described 
text-to-speech preprocessor 2501. Static Optimizer 2312 and dynamic optimizer 231 1, 
which were described with reference to the description of Fig. 23 above are not illustrated 
in this example for reasons of preserving drawing space, but may be assumed to be 
present. 

25 Adaptation engine 2801 has direct access to a dialog runtime processor (Dialog 

Runtime Prcsr.) as illustrated herein by a double arrow. Adaptation engine 2801 also has 
communication access through logical bus structure 2315 to VXML rendering engine 
2314, voice application development software (VADS) block 2308, and 
database/resource adapter block 2310. 
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As interaction takes place between a caller and an enterprise using the voice 
application software of the present invention, responses to menu options and the like 
vocalized by the caller at the caller's end are rendered in a preferred embodiment as 
XML-based text and are interpreted at voice application server 2301 for determination of 
5 a subsequent response to be delivered to the caller. The response from the enterprise may 
vary, in this case, from caller to caller and is rendered as VXML (VXML page) for voice 
synthesis at the portal used by the caller, in this case portal 2302 or Web portal 2303. 

Adaptation engine 2801 is adapted to intercept VXML responses from a caller 
during interaction and to analyze the response according to a set of behavioral constraints 
10 that are linked to response options, which are selected and then embedded into a VXML 
response that is played as a synthesized voice to the caller. 

The variety of behavioral constraints that may exist for determination of a proper 
and correct VXML page response is not limited. For example, a user may be very 
familiar with a particular set of voice application menus through repetitive use. During 
15 menu rendering, the just-mentioned user may be predisposed to selection a particular 
option further down in the menu tree before the option is reached in the menu. The 
behavior, then, of that particular user, is that the user vocalizes the sub-option nearly 
every time the service is accessed. In this case, adaptation engine 2801 recognizes the 
selection before the offering and determines that this particular user is very familiar with 
20 the service and menu tree. The next time the same user calls the service, the correct result 
can be delivered to the user immediately, skipping menu navigation. The kind of 
behavior covered above is menu navigational behavior. 

Another type of behavior that can be determined on a case-by-case basis is the 
degree of caller stress. Engine 2801 can determine a stress level for a particular caller by 
25 analyzing response content. There are two forms of response content that can be analyzed 
by engine 2801 in a preferred embodiment. These are VXML text-based content and 
voice samplings attached to the VXML documents as an attachment. For example, 
certain expletives or other "negative" words or phrases can be recognized and rendered if 
a caller verbalizes them while interacting with a voice application. Adaptation engine has 
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access to external resources such as Rule Mgr. and external data stores through adapter 
block 2310. 

In one embodiment, adaptation engine 2801 can analyze short Wav files or other 
digitized voice files that can be sent along with XML-based text files. In this way stress 
5 levels of a caller can be determined and response selections then based on the determined 
levels. For example, if a caller vocalizes at a decibel rate above a pre-set threshold, he or 
she may be determined to be shouting. The decibel rate can be analyzed from a short one 
or two-second audio file recorded at the start of an interaction segment and then sent as 
an attachment along with the text rendering of the caller's response. 

10 Speed of menu navigation can be determined by engine 2801 to roughly determine 

the level of experience the caller has with the particular system of the enterprise. 
Subsequent menus and sub-options may be annotated, dropped or added based on results 
of an "experience level" determination of a particular caller. There are many behavioral 
considerations that may be taken into account. 

15 Adaptation engine 2801 may be provided as a logic that cooperates with TTS 

processor 2501 and with the previously mentioned optimizers to fine tune menu and 
option selections for subsequent rendering as VXML pages to the caller during 
interaction. In some embodiments, engine 2801 can override other dialog enhancement 
tools if constraints allow. For example, if a caller is extremely upset, a subsequent voice 

20 application dialog may provide a live connection option to a live agent for immediate 
redress of the caller's concerns. 

Fig. 29 is a process flow diagram illustrating user interaction with the system of 
Fig. 28 according to one embodiment of the invention. At step 2900 a user accesses a 
voice application. A voice application can be accessed through any voice-enabled portal. 

25 In some cases a user calls the voice application. In other embodiment, the voice 

application calls the user and the user simply picks up the call. In still other applications 
depending, in part on media support, the voice application may be broadcast or multicast 
to users. At step 2900, a greeting and a user identification regimen or menu may be 
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executed. Identification may comprise a biometric function like voice imprint 
identification, or password/pin option. 

At step 2901, a first outgoing menu is played to the user. The menu is the first 
interactive menu of the dynamic voice application delivered to the user. If the application 
5 is executed on an interactive voice response (IVR) system, a combination of voice and 
touch-tone responses may be accepted as user responses, however the focus of this 
example is on voice response. In this example process it is assumed that a constraint 
exists related to the navigation behavior of the user while navigating the menu delivered 
in step 2901. If for example, a user responds by vocalizing a menu or sub-menu option at 

10 step 2902 before the menu of step 2901 concludes, then at step 2903 an adaptation engine 
analogous to engine 2801 described with reference to Fig. 28 analyzes the response. 

Response interception in step 2903 may be triggered or it may be continuous 
during an application. In the case of a user response before the menu has finished, the 
fact that a response came in before the time allotted for the menu expired could be a 

15 triggering factor for interception by the adaptation engine. Analyzing a response at step 
2903 may include matching the response with an appropriate response option or result at 
step 2904. The adaptation engine may form and store a record that reflects the user's 
selection of a. specific sub-option well before the sub-option was available through menu 
navigation. In this case the engine may label the particular caller as "experienced" with 

20 the system meaning that he has memorized a particular sub-option to expedite a returned 
result. 

At step 2905, the closest matching response option selected as a response to the 
analyzed caller's menu response is returned and a decision is made regarding the next 
menu option to play if any at step 2906. If the selected response from the enterprise 
25 comprises a dialog containing an embedded result requested by the user, there may be no 
further menu activity or interaction. If the caller request of step 2902 contains a request 
for a specific menu, then at step 2907 the selected menu option is embedded for VXML 
page rendering. In the mean time at step 2908, any unnecessary menus that would 
normally have been part of the voice application are discarded and not sent to the user. 
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In the case of a non-experienced user, it is probable that step 2901 will play out 
completely before the user at step 2909 makes a selection. At step 2910, the adaptation 
engine may store a record that the user is "new". In any event, the engine will intercept 
and analyze the user response (if configured to do so) and compare the user response with 
5 enterprise response options according to existing enterprise rules as described in step 
2904. Also as described above, at step 2905 the closest matching option for enterprise 
response is returned. At step 291 1 the next ordered menu option is played if one exists, 
or a fetched result embedded into the enterprise response dialog may be played at step 
2911. 

10 In the process outlined above, the behavioral constraint dictates that in the case of 

a user navigating ahead of offered menus and sub-options, to drop the unnecessary 
options in-between. Statistical probability can be used to further enhance response by the 
system, for example, by causing a particular menu option or option result to be played to 
a caller according to determination of probability that that is the correct response based on 

1 5 recording a number of the same transactions from the same caller. 

One with skill in the art of voice application creation and deployment will 
appreciate that the steps illustrated above may include sub-processes without departing 
from the spirit and scope of the invention. For example, there may be sub processes after 
step 2906 for text-to-speech preprocessing and static or dynamic optimization before a 

20 final VXML page is rendered as a response from the enterprise to the user. In some 
cases, actions resulting from adaptation engine analysis can be configured to override 
certain other processes. It will also be apparent to one with skill in the art that the goal 
and order of the illustrated interaction process steps may change according to the nature 
of constraints that will affect the process. 

25 Fig. 30 is a process flow diagram illustrating user interaction with the system of 

Fig. 28 according to another embodiment of the invention. At step 3000 a greeting is 
played to a caller accessing the voice application. The greeting can be personalized to 
individual callers based on caller ID, automated number identification (AM), or other 
identification methods. 
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At step 3001, the caller logs into the voice application. This step is optional in 
some cases. Login may include a biometric voice print identification, which can be 
verified at the enterprise by analyzing a short audio voice sample of the caller that is 
recorded and sent along with XML-based data as an attached audio file. 
5 Assuming security approval of the caller of step 3001, a first menu of the voice 

application is played to the caller at step 3002. If the caller is familiar with the system as 
was described above regarding the process illustrated with respect to Fig. 29, he or she 
may respond before completion of the initial menu at step 3003a. At step 3003c, the 
adaptation engine gauges the familiarity level that the caller has with the system based on 

10 response analyzing. For example, if the caller vocalizes a recognized menu sub-option of 
a menu much further down a menu tree, then the caller may be labeled "system friendly" 
and the appropriate streamlining of the voice application ensues. 

In this exemplary interaction then, the second offered menu and associated 
options are skipped in step 3004 assuming the vocalization of step 3003a is determined 

15 not to be the second menu or related options. At step 3005, the third offered menu is also 
skipped, however in step 3006 a sub-option result, which could be a fetched result 
equating to one of the sub-options of the third menu is played because it is the recognized 
result value that "answers" the vocalization of the caller in step 3003a. The exact nature 
of the dialog played at step 3006 will directly depend on the content of the caller's vocal 

20 response at step 3003a. If the caller requests a specific menu instead of a specific "tuple", 
then the appropriate menu is played. 

The constraint associated with the just-mentioned order of steps is very similar to 
the one described above with respect to the description of Fig. 29. That is, that if the 
caller already knows the options and sub-options, the system may skip all unnecessary 

25 dialog of the voice application. 

An additional constraint is included in the exemplary process illustrated. For 
example, if the caller is indecisive in his or her response to the first menu of step 3002, as 
is illustrated at step 3003b, then at step 3003c the adaptation engine may label the caller 
as "system novice". Such labeling triggers the system to prepare a dynamic help menu at 
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step 3007, the menu containing options closely related to the nature of the caller's 
indecisiveness. At step 3008 the interactive help menu is played to the caller. 

The help menu can be dynamically constructed as dialog objects belonging to a 
"whole help menu". The dynamic option would include only the interactive options that 
most closely relate to the caller's current problem as detected by the adaptation engine. 
For example, if the caller vocalizes "option A or option B" because he is not sure which 
one to go with, then the adaptation engine can trigger preparation of the appropriate 
factoids etc, that enable the caller to make a more informed decision. The dynamic help 
menu can link back to the original menu when complete allowing the caller to repeat after 
getting the required assistance. This can be accomplished without the caller having to 
hang-up and redial. 

Using the example above, different individuals can be serviced by the system 
according to their needs. The faster more experienced individuals have their voice 
applications streamlined for faster service whereas the slower individuals have their voice 
applications extended according to their specific needs. 

Fig. 31 is a process flow diagram illustrating user interaction with the system of 
Fig. 8 according to yet another embodiment. At step 3 1 00 a greeting is played assuming 
caller access to a voice application as described further above. At step 3 1 0 la, the client 
logs into the voice application, including password, pin, or perhaps voiceprint 
identification as previously described. During step 3101a, the behavioral adaptation 
engine samples the voiceprint or other vocalization of the client and detects a high stress 
level in the caller's voice. The vocalization can be recorded and sent along with the 
XML-bases data as a digital audio file. Constraints geared to stress detection may include 
decibel output thresholds. Decibel comparisons can be made with previous results 
sampled during previous transactions by the same client to gauge average decibel output 
of the clients voice for the purpose of setting a particular decibel threshold for that client. 
If the latest sampling is higher than the average range then it may be determined that the 
client is experiencing stress. 
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In another embodiment, any uttered expletives, or inflection characteristics may 
also indicate that the client is under stress. The adaptation engine can recognize these 
general voice characteristics through sampling a short audio rendering taken at the first 
interaction as previously described. Expletives, negative phrases, and inflection patterns 
5 may be stored and standardized for the client over multiple transactions enabling the 

system to detect if the client is in a different than average mood for example. Moreover, 
lack of certain phrases usually vocalized by the client may be an indication of a higher 
than normal stress level. For example, if the client always says "may I have my balance 
please" and a latest sampling is determined to lack the word please, then the system may 

1 0 decide that the client has a higher level of stress than normal. There are many differing 
methods for configuring the constraints for stress detection. 

Detecting a high stress level in the client triggers, in this exemplary interaction, a 
decision to monitor the interaction at step 3 101c. Monitoring may range from periodic 
voice sampling by the adaptation engine during client interaction with the voice 

15 application to continued monitoring by a live agent or supervisor. Steps 301b and 3101c 
can occur before the first menu is played at step 3 102. At step 3 1 03a the client vocalizes 
an option from the previous menu. It is noted that the client is being monitored during his 
response. 

Assuming that the behavioral adaptation engine through voice sampling and 
20 analyzing conducts the monitoring, then at step 3 103b in this exemplary process the 
adaptation engine detects a continuance of high stress in the client. As part of a 
monitoring constraint, at step 3 103c the engine triggers the system to prepare a special 
menu. The exact rule may require 2 or more samplings wherein the stress is high at each 
sampling before triggering preparation of a special menu. In other cases, one sampling 
25 may be enough. It is also possible that live monitoring would be the triggered result of a 
first sampling detecting stress in which case the process would be different than is 
illustrated herein. 

In this exemplary process a special menu is prepared at step 3 103c and then 
delivered to and rendered to the client at step 3104 as an interactive voice menu. In this 
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case, the menu offers at least one automated menu option and includes an option to 
connect to a live agent, perhaps ahead of other callers waiting for an agent because of the 
detected stress level. Other factors may also be used to make a decision to intervene with 
a live agent, or at least offer the option in a combined menu. These factors could include 
5 payment history, client standing with the enterprise, the monetary value of a client's order 
placed with the enterprise, and so on. 

In one embodiment of the invention the behavioral adaptation engine can be 
configured to intercept every interaction with a particular application. The engine can be 
adapted for multiple simultaneous interceptions and computations in a multitasking 

1 0 environment. In another embodiment, the engine can spawn separate and functional 
instances wherein each spawned instance is dedicated to a particular client interacting 
with a particular application. The engine can utilize data queues, external resources, and 
other computation modules in function. For example, the engine may intercept a client 
response and make a constraint-based determination including selection of a set of 

1 5 possible dialog responses, which may then be narrowed to an optimum response through 
text-to-speech preprocessing based on another set of un-related constraints before static or 
dynamic caching is determined through constraint-based optimization. 

In other embodiments, the engine may be configured to execute according to 
trigger constraint wherein if the constraint is valid for a particular user the engine 

20 intercepts the next client response. Voice applications are dynamically annotated 
according to values generated from analytic results that can be optimized through 
statistical analysis to provide personalized service for repeat clients using the same 
application over and over. A client control could be provided and made available during 
interaction to enable the voice application user to override certain enterprise responses 

25 that were developed and served with the help of statistical development and analysis. For 
example, if a client has repeatedly asked for a specific result provided by the service to 
which the result is invariably embedded into a first greeting every time the client accesses 
the system, a "return to original menu option" could be provided with the result so that if 
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the client was not seeking the result in the current transaction the original voice 
application menus could be ordered. 

Fig. 32 is a block diagram illustrating basic components of behavioral adaptation 
engine 2801 of Fig. 28 according to an embodiment of the present invention. Behavioral 
adaptation engine (BAE) 2801 is provided, in this example as a self-contained module 
that can communicate with other modules as well as with external resources. Engine 
2801 has an input block 3202 adapted to receive client data input resulting from client 
interaction with a voice application. In one embodiment, the client data input is delivered 
to BAE 2801 over the logical communication bus structure 23 15 of voice application 
server 2301 described with reference to Fig. 28 above. 

Client data includes dialog response data, client history and /or status data, and 
client voice samplings sent along with the client data as digital audio file attachments. In 
a preferred embodiment, the client data other than voice files is XML-based data rendered 
at the client portal from client voice responses. 

Input bloc 3202 may comprise multiple input ports, input data queues and 
processing logic as would be the case of a standard port. In one embodiment, input block 
3202 may be a bi-directional port although it is a unidirectional port in this example. 
Error communications and dialog responses sent back to interacting clients are delivered 
through ports of the voice application server. 

BAE 2801 has a processing logic block 3201 provided therein and adapted for 
processing XML-based data and if present, voice files attached to the XML packages. 
Processing logic has a voice analyzer 3204 that is capable of receiving short audio files 
and analyzing them for decibel characteristics, voice inflection characteristics, and 
inclusion of certain terms or lack thereof. Analyzer 3204 contains a voice file player (not 
shown) for executing the voice files for analyzing. 

Processing logic 3201 has an XML reader 3205 provided therein and adapted for 
discerning XML-based data rendered thereto from the voice response of the client. XML 
reader also receives and interprets other client input data such as manual selections made 
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by the client and identification data as well as any other data that may be provided about 
the client along with the client interaction data. 

BAE 2801 has an input/output block 3205 provided therein and adapted to 
communicate bi-directionally with external data sources including statistical and rules 
5 databases as well as with internal system modules. In one embodiment, external data 
communication between block 3205 and other systems and modules is facilitated by the 
logical bus structure of the voice application server described further above. Direct 
access to external sources is achieved through the resource adapter block 23 10 described 
with reference to Fig. 28 above. However, engine 2801 may in some embodiments, be 
10 adapted to access external resources directly and independently of normal server 

communication. In this case, the engine would have dedicated server ports provided 
thereto and adapted for communication over remote data lines. 

I/O block 3205 may contain multiple input and output data queues as well as port 
processing logic as is generally known to exist in data port architecture. At the heart of 
1 5 BAE 2801 is a decision logic block 3203 that is adapted to make a decision as to which 

available enterprise dialog response or set of responses will be identified as candidates for 
a response that is embedded into or linked to a dialog rendered as a VXML page and sent 
back to a client interacting with the voice application. 

Decision block 3203 processes the combined results of voice analyzer 3204 and 
20 XML reader 3205 according to one or more enterprise rules and if applicable, external 
data and statistics values and formulates a value that identifies one or a set of candidate 
enterprise dialog responses that are submitted for VXML page rendering. The decision 
values may equate to, for example, one or more optional menus or menu options, links for 
establishing live communication interventions, and links to other voice applications or 
25 menus or options contained therein including any fetched results. 

In this example, BAE 2801 is illustrated as a self-contained module. How ever, in 
other embodiments the functions and capabilities of BAE 2801 may be provided as a 
plurality of interoperating modules that are distributed within the voice application server 
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domain such that they may interact with each other to accomplish the goals of the 
invention. There are many possibilities. 

The method and apparatus of the invention should be afforded to broadest 
interpretation under examination in view of the many possible embodiments and uses. 
5 The spirit and scope of the invention is limited only be the claims that follow. 
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What is claimed is: 

1. A voice application creation and deployment system comprising: 

5 a voice application server for serving voice applications to clients over a data 

network; 

at least one voice portal node having access to the data network, the portal node 
for facilitation of client interaction with the voice applications; and 

a behavioral adaptation engine executable from the application server; 
10 characterized in that the behavioral adaptation engine intercepts client responses 

during voice interaction with a served application, analyzes them for one or a 
combination of behavior patterns and mood states according to pre-existing constraints 
and received client information, and determines which of a set of possible dialog 
responses including linked to or insert able options will be submitted for VXML page 
15 rendering to create a next enterprise response played to the client. 

2. The system of claim 1 wherein the data network is the Internet network. 

3. The system of claim 1 wherein the data network is a combination of the Internet and 
20 telephony network. 

4. The system of claim 1 wherein the behavioral adaptation engine is part of the 
application logic of the voice application server. * 

25 5. The system of claim 1 wherein the at least one voice portal is an interactive voice 
response system combined with a telephony server. 

6. The system of claim 1 wherein the at least one voice portal is a computerized node 
connected to a data network having access to the Internet. 
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7. The system of claim 1 wherein the behavioral adaptation engine analyzes audio files 
recorded at the at least one voice portal and sent to the application server as digital audio 
files attached to client responses., ' 

5 

8. The system of claim 1 wherein the behavioral adaptation engine executes upon receipt 
of a trigger event. 

9. The system of claim 1 wherein the constraints are related to one or a combination of 
10 menu navigation behavior or perceived mood state of the client. 

1 0. The system of claim 1 wherein the dialog responses and linked options are stored in a 
data store and are accessible to the behavioral adaptation engine. 

15 11. The system of claim 1 wherein the received client information includes one or a 
combination of line identification, number identification, client history data, voice 
imprint results, and recorded voice samples. 

12. The system of claim 1 wherein voice sampling is used to discern mood. 

20 

13. The system of claim 1 wherein received client information is used in conjunction with 
voice analysis to determine a response. 

14. The system of claim 1 wherein the behavioral adaptation engine detects voice 
25 inflection variances and volume characteristics of sampled audio to facilitate mood 

discernment of a client. 
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15. The system of claim 14 wherein the variances and volume characteristics of an 
interaction are collected over multiple interactions with a same application to develop 
statistics used in gauging enterprise response probability values. 

5 16. A behavioral adaptation engine integrated with a voice application creation and 
deployment system comprising: 

at least one data input port for receiving XML-based client interaction data 
including audio files attached to the data; 

at least one bi-directional data port for sending data to and receiving data from 
1 0 external data systems and modules; 

a logic processing component including an XML reader and voice player and 
analyzer for processing received data; and 

a decision logic component for processing result data against one or more 
constraints; 

1 5 characterized in that the behavioral adaptation engine intercepts client data 

including dialog from client interaction with a served voice application in real time and 
processes the received data for behavioral patterns and if attached, voice characteristics of 
the audio files whereupon the engine according to the results and one or more constraints 
identifies one or a set of possible enterprise responses for return to the client during 

20 interaction. 

17. The engine of claim 16 wherein the engine is hosted in a voice application server. 

18. The engine of claim 17 wherein the server is hosted on the Internet network. 

25 

19. The engine of claim 16 wherein the voice application and deployment system 
includes at least one voice portal for facilitation of client access to voice applications. 
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20. The engine of claim 16 wherein the engine is executed to function upon receipt of a 
trigger event. 

21. The engine of claim 1 6 wherein the constraints are related to one or a combination of 
5 menu navigation behavior or perceived mood state of the client. 

22. The engine of claim 16 wherein data from external data resources is used as 
additional input data for decision processing. 

10 23. The engine of claim 1 6 wherein the received client data includes one or a 

• combination of line identification, number identification, client history data, and voice 
imprint results. 

24. The engine of claim 17 wherein voice sampling is used to discern mood state. 

15 

25. The engine of claim 16 wherein the voice analyzer detects voice inflection variances 
and volume characteristics of sampled audio to facilitate mood discernment of a client. 

26. The engine of claim 25 wherein the variances and volume characteristics of an 
20 interaction are collected over multiple interactions with a same application to develop 

statistics used in gauging enterprise response probability values. 

27. A method for identifying an appropriate one or set of a plurality of voice application 
dialog responses to data input resulting from a client interaction with a voice application 

25 comprising: 

(a) receiving the data input during run of the voice application; 

(b) interpreting the data input; 

(c) analyzing the input for validity of one or more constraints; 

(d) comparing the analyzed results with additional external data; 
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(e) analyzing the comparison results for continued validity of the one or more 
constraints; and 

(f) identifying one or more available response options according to the valid 
constraints. 

5 

28. The method of claim 27 wherein the voice application is VXML compliant. 

29. The method of claim 27 wherein in step (a) the data input includes client 
identification data, client dialog data, and digital audio sampled from the dialog. 

10 

30. The method of claim 27 wherein steps (d) and (e) are optional steps. 

31. The method of claim 27 wherein in step (c) the constraint validity lends to indication 
of mood state determination of the client. 

15 

32. The method of claim 27 wherein in step (c) the constraint validity lends to indication 
of behavioral state determination of the client. 

33. The method of claim 27 wherein in step (c) the constraint validity lends to an 
20 indication of both mood state and behavioral state of the client. 

34. The method of claim 27 wherein in step (d) the external data includes statistical data 
resulting from of past interactions with the same dialog of the same application. 

25 35. The method of claim 27 wherein in step (f) identification of one or more available 
response options includes submitting the one or more response options to an external 
module for further processing and narrower selection. 
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36. The method of claim 35 wherein the external module is a text-to-speech pre- 
processor. 
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